The Journal of 
Experimental Education 


A periodical report of scientific investigations relating to child development, 
curriculum, learning, teaching, supervision, measurements, 
statistics, and experimental techniques. 


Vol. XXVII September 1958 No. 1 
=== 


CONTENTS 


Intercorrelations and Factor Analysis of Tests Given to Teaching Can- 
didates ].C. Gowan 1 


of Analysis of Variance to the Estimation of the Reliability of 
Observations of Teachers’ Classroom Behavior 
Donald M. Medley and Harold E. Mitzel 23 


A Descriptive Analysis of a Departmental Curriculum Improvement Proj- 
ect in an Urban Junior High School William M. Rasschaert 37 


Literal and Critical Reading in Social Studies E. Elona Sochor 49 
Literal and Critical Reading in Science Ethel S. Maney 57 


Children’s Perceptions of Relationships Among Their Family and Friends 
Ivan N. Mensh and John C. Glidewell 65 


$7.50 A YEAR PUBLISHED QUARTERLY $1.75 A COPY 
Published by Dembar Publications, Inc., 
Madison 3, Wisconsin. 
17, 1938 at the post office at Madison, 
Wisconsin, under the act of March 3, 1879. 


PAGE 


EDITORIAL BOARD 


A. &. Barr, Chairman, Professor of Education, University of Wisconsin, Madison 6, Wis. 


Jacob O. Director, Research Services, 
Universicy, 


e, Illinois. 
on university 
Arthur T. Jersild, Professor 
lege, Columbia University, New York le eee | 
ble for materials child 
Palmer O. Johnson, Professor University of 
Minneepels Minnesota, respon 


of 
tion of the City of New York, 
Brooklyn, 
e for on cur- 

each June. 


CONTRIBUTING EDITORS 


Betts Haver- 


Leo J of Bincation, University 


Harold D. Carrer, Associate, Professor of Education, vat 


Jom c. of Psychology, of 

Toronto, Toronto, Canada. 


niversity, 


d E. Professor of Psychology and Director, 
is Child Welfare, University of California, 
' Berkeley 4, California. 


Noel Keys, Professor of Education and Lecturer in Human 
Relations, University of California, Berkeley, California. 

D. Wel . Professor of Education, University of 
Los Angeles, California. 


Lincoln, Consulting Psychologist, Halifax, 
jusetts. 


Massach 
Irving Lorge, Professor of Education, Executive Officer, 
Institute peyepelogioal Research, Teachers College, 
Columbia University, New York 27, New York. 


tional Research, University 
ounge Building, Gainesville, 


Edward A. 


of Education, University of 


Valworth R. Plumb, Division Education and 
Feychology, University of ¢ ), Duluth, 


8S. L. Pressey, Professor of Educational Psychology, Ohio 
State University, Columbus, Ohio. 


Research, The University Oklahoma, 


Robert T. Rock, Jr., Professor 
r. of of 
sity, New York 


Research Evaluation Psychologist, Lack- 


Louis G. Schmidt, Assistant Professor of Counsel 
I’ of Education, Indiana 


School 


Harold Seashore, Director, Test Di 
cal Corporation, New York 18, New 


David Segel, Educational Consultant, Specialist in T 
and Measurements, Federal Security Agency, U. 
- Office of Education, Washington, D. C. bs 


a y, Professor of Educational Psychology, 

of University, Alabama. 

Helen Thom Associate N 
Pork Post-Graduate Hospital, 30% Hast Street, New 
York 3, N. Y¥. 

Robert L. Associate Professor of Education, 
Teachers College, Columbia University, New York City. 

Herbert A. T Professor of Psychology, Ohio State 
University, Columbus, Ohio. 

Maurice E. Tro: Vice President, Japan International 
Christian University, Tokyo, Japan. J 


Helen M. Walker, Professor of Educat' Teachers Col- 
lege, Columbia University, New York 4 

L. Wel Prof of Child Welfare 

State Uni of to Iowa City, 


M. yan, Emeritus Professor Boston 
niversity, Pine Street, Wellesley Massa- 


Paul A. Witty, Professor of Education, Director of Psycho- 
tional’ Clinic, School of Education, Northwestern 
niversity, Evanston, Illinois. 


D. A. Worcester, Chairman of Educational 
Paychology and, Mecsurement, University of Nebraska, 


H. H. Remmers, Professor of Psychology, 
Director Division Educational Reference, Purdes Univer- 
sity, Indiana. responsible mate- 
Wayne W: 
ev 
New York. Ecitorially 
riculum construction, 
5 William Reitz, Associate Professor of Educa Coll 
of Education Examiner, Wayne Detroit 
Michigan. 
f 


JOURNAL OF EXPERIMENTAL EDUCATION 
(Volume 27, September 1958) 


INTERCORRELATIONS AND FACTOR ANAL- 
YSIS OF TESTS GIVEN TO TEACH- 
ING CANDIDATES 


J. C. GOWAN 
University of California at Los Angeles 


THE STUDY reported below grew out of a 
larger program of assessment and evaluation of 
teaching candidates reported elsewhere (7), sup- 
plemented by smaller samples from other teach- 
er training institutions. It is the purpose of the 
present report to point up certain interrelation- 
ships between testing instruments which may 
have significance beyond the context in which 
the tests were given. 


Sample One 


1. Scope of Study 


This study gives intercorrelations and result- 
ant factor analyses from a matrix of order 62 
produced by the scores of teaching candidates on 
a battery of tests. In addition tothelarge num- 
ber of scales making up the matrix, the study is 
of interest because of the sizeable numbers of 
subjects involved. These subjects were junior, 
senior and graduate students at the University of 
California, Los Angeles. The tests adm inis- 
tered were part of a required series given by 
the Teacher Selection and Counseling Service of 
the School of Education. This agency processes 
about 1400 cases per year. 

It is not the purpose of this paper either to 
explore the literature, or engage in discussion 
about criteria of teaching success, or develop 
teacher prognosis scales. These matters are 
handled elsewhere (1,4,5,6,9,10). The design 
is simply to detail as briefly as possible the 
rather extensive intercorrelations and the factor 
analysis results accruing therefrom. 


2. Tests Used and Method of Procedure 


The tests used in this study were the Cooper- 


ative English Test, the Stanford Arithmetic Test, 
the American Council Psychological Examination, 
the Minnesota Multiphasic Personality Inventory, 
the California Psychological Inventory, the Re- 
vised Study of Values (Allport etal.), the Guitford- 
Zimmerman Temperament Survey, and two new 
scales on the MMPI alleged to predict teaching 
success or failure. The teaching scales were 
the plus and minus sections of a scale devised by 
the authors and detailed more fully elsewhere (5). 
The three thousand odd correlation coefficients 
needed for the study were obtained from IBM 
cards by a method outlined by J. C. Flanagan (2). 
Briefly, this method consists of determining what 
percent of the top 27 percent criterion group on 
the independent variable lie above the median 
score on the dependent variable, and what percent 
of the bottom 27 percent criterion group on the in- 
dependent variable behave in similar fashion. 
The names of the scales of the various tests, 
their code symbols, and the variable numbers 
are detailed in Table I. It will be observed that 
the scale codes are grouped so that all scales for 
a certain test have a common initial letter. Care 
has been taken so that the scale numbers cor re- 
spond (as in the MMPI) to commonly accepted 
usage. Validating scales have small letters. A 
few of the scales were not used throughout the 
study. For example, D5, the Masculinity- Fem- 
ininity scale of the MMPI, was not used because 
the scoring differs for men and women. No at- 
tempt was made to correlate scales Dll, D12, 
D13 or D14 with any of the later scales, since 
Gough used these scales to develop corresponding 
scales (E5, El, E4, E10) on the Psychological In- 
ventory. The intercorrelations of the A group 
with groups beyond D are also missing, since a 
factor analysis showed that practically all the 
variance of this group was being cared for by the 


*Now of Los Angeles State College. The authors are indebted to the University of California for a re- 
search grant covering in part the expense of this study. Further acknowledgments are due to Dr. Har- 
rison Gough for permission to use the California Psychological Inventory and for scoring it, to Mr.Rob- 
ert Rutz and the IBM room personnel of the UCLA Controller’s Office for running the card sorts; to the 
staff of the School of Education Teacher Selection and Counseling Service for cooperation and assistance; 
to Gordon Fifer, Fred Machetanz and Enid Janssen who assisted in the computation, and to Dorothy 


Kern who drew the figures. 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE I 


NAMES AND CODE SYMBOLS FOR SCALES USED IN INTERCORRELATION STUDY I 


Name of Scale Code Name of Scale 


Cooperative and Stanford California Psychological Inventory 
Cooperative English Vocabulary :  Infrequency 
Cooperative English Speed i: Good Impression 
Cooperative English Level :  Dissimulation 
Cooperative English Mechanics : Social Responsibility 
Cooperative English Effectiveness : Tolerance 
Stanford Arithmetic (Part 1) : Flexibility 
: Status 
American Council Psychological : Dominance 
Q: Quantitative : Social Participation 
L: Linguistic : Femininity 
Total : Delinquency. 
Intellectual Efficiency 
Allport Study of Values : Academic Achievement 
Theoretical : Honor Point Ratio . 
Economic : Psychologist’s Interest 
Aesthetic : Neurodermatitis 
Social : Poise, Self-confidence 
Political : Impulsivity 
Religious 
Guilford- Zimmerman Temperament 
General Energy 
Restraint 
Ascendance | 
Sociability 
Emotional Stability 
Objectivity | 
Friendline 
Personal Relations 
Masculinity 


innesota Multiphasic Inventory 

Lie 

Falsification 

Suppressor Variable 

Hypochondriasis 

Depression 

Hysteria 

Psychopathic Deviate 

Paranoia 

Psychasthenia F10 
Schizophrenia 

Hypomamia G Teacher Prognosis Scales (MMPI) 
Social Introversion Gi Tp: Teacher Positive 


Dominance G2 Tn: Teacher Negative 
Social Responsibility 


Status 
Academic Achievement 


T: 
E: 
A: 
S: 
P: 
R: 
Mi 
F: 
K: 


x 


2 
Code | 
A 
Al | 
A2 | 
A3 
A4 
A5 | 
AG | 
B 
Bl 
B2 
B3 
Cc 
Cl 
C3 
C4 
C5 
C6 
Da 
Db 
De 1 
Di 
D2 | 
D3 
D4 
D6 | 
D7 
D8 
D9 
D10 
Dil 
D13 | 
| 
1 
| 


GOWAN 


B group (ACE scores). 

It was not considered feasible to perform a 
factor analysis of the entire matrix, so various 
minors were selected for the purpose. In order 
to present the matrix in form which can be ac- 
commodated on paper of standard size, it was 
split up into various minors. A schematic pre- 
sentation of the matrix with respect to this sec- 
tioning is shown in Tables Il andIIIl. Table II 
shows the number of students involved in each 
section, from which it will be seen that far fewer 
cases were covered in the last three groups. Ta- 
ble III indicates what parts of the master matrix 
are displayed in future tables indicating specific 
minors of the determinant. 

Tables IV to VII inclusive present various 
minors of this matrix. Tables VIII to XI, inclu- 
sive, present factor analyses resulting from this 
material. The remaining tables in the paper 
concern different samples of a much smaller 
magnitude. 


3. Results and Discussion 


The results so far as intercorrelations inthe 
Tables IV to VII are concerned, speak for them- 
selves. It is not considered feasible to discuss 
all the implications raised. Such material, how- 
ever, may be valuable to investigators other than 
those in education, and are presented with this 
in mind. 

An interesting empirical check on the stand- 
ard error of measurement for the short cut meth- 
od of obtaining correlation coefficients appears 
worthy of comment. The method utilized (Flan- 
agan’s tails method) results in an ajj which may 
be different from ajj. In Minor I (Table IV) the 
difference (ajj - aji) was computed for the 870 
coefficients in the 30 x 30 matrix. After elim- 
inating and correcting errors signaled by high 
differences, the following distribution was ob- 
tained: 


Difference 
Interval 


20 to 
17 to 
14 to 
11 to 
8 to 
5 to 
2 to 
- lto 
- 4to 
- Tto 
-10 to - 8 
-13 to -il 
-16 to -14 


Frequency 


The mean of this distribution was gratifyingly 
near zero; the standard deviation was 4.92. The 


standard error of an ‘“‘r’’ of zero for an N of ' 
1700 is 2.42 by the usual formula for standard er- 
ror of Pearsonian ‘‘r’’. In practice the differ- 
ences were averaged. It may be noted that Flan- 
agan (2:347) gives an approximation for what 
amounts to the standard error of a correlationco 
efficient obtained in the above manner. In the 
case, ‘‘r’’ equals 0, it is 1.3 times the standard 
error for the Pearsonian coefficient; for an ‘‘r’’ 
of .45, itis 1.5 times as much. The maximum 
standard error values given in the tables are for 
Pearsonian ‘‘r’s’’. These facts should be taken 
into account in interpreting the tables. 


4. Factor Analysis 


Factor analysis of various selected minors of 
the material in Sample One was done by centroid 
methods outlined by Thurstone (11). Results for 
a 30 x 30 matrix of variables 1 through 30 are 
shown in Table VIII. This matrix consisted of 
variables of the Cooperative English Test, Amer- 
ican Council Psychological Examination, Allport 
Study of Values, and Minnesota Multiphasic Inven- 
tory. Six factors were extracted. These were 
sufficient to account for more than half the vari- 
ance except on the following scales: all scales of 
the Allport; and Lie, Falsification, Psychopath- 
ic Deviate, Paranoia, Hypomania and Status of 
the MMPI. The factors were left unrotated, at 
least in this initial study, since some of the fac- 
tors (such as factor I, which obviously repre - 
sents general intelligence) had considerable psy- 
chological significance as they stood. Some ro- 
tation was attempted later, as shown in Table XI. 
The unrotated factors of factor analysis I were 
designated as follows: 


Factor I, with its very high loading on the A. 
C.E. (practically to the reliability coefficient) 
and with somewhat less high loadings on the C 0- 
operative English Test, was named “‘Intelligence’? 
\As can be seen, it is considerably more verbal 
than numerical. The only other loading above .20 
on this factor is positive with Theoretical and 
Dominance, and negative with Lie scales. This 
one factor takes out so much of the variance of 
the first nine variables that only in Factor V is 
there found a single loading of .15 or more. The 
A.C.E. Total effectively speaks for the other 
variables. 

Factor II is rather well defined as ‘‘K’’, the 
somewhat mysterious suppressor variable onthe 
MMPI. Because of the fact that either fractions 
or the full amount of the ‘‘K’’ value is added to 
the Hs, Pd, Pt, Sc and Ma scales, care should 
be used in concluding that correlations with these 
scales fix the description of ‘‘K’’. Of the ‘‘un- 
contaminated scales’’ the order of factor load- 
ings is as follows: Hysteria, Responsibility, 
Paranoia, Lie, Dominance and Status. Moder- 


3 
1 
1 
5 
5 
12 
38 N = 435 
76 M = 0.06 
141 = 4.92 
94 
41 
| 16 
4 
1 


00€ 


OOLT 
OOLT 


OOLT 


(Id) stsouZ01g 

TED 
jo Apnjg 


pue 


SAL 19q SaTeog 


pue 


: 
< 
8 
: 
° 
= 


GHA 'TOANI ONIMOHS XTULVW AHL JO OLLVWAHOS 


4 
on 
nN « 
N 
as 
as 
Ooo 
§ 
<0o|8 


TABLE Il 


SCHEMATIC PRESENTATION OF THE MATRIX SHOWING MINORS EXHIBITED 


ry 


Code and Test 


Cooperative 


English and Minor I 
Stanford Table IV 


American 
Council 
Psychological 


Allport- 
Vernon Study 
of Values 


Minnesota 
Multiphasic 
Inventory 


California Minor III 


Psychological Table VI 
Inventory 


Guilford- Minor IV 
Zimmerman Table VII 
Temperament 


Teacher 
Prognosis 
Scales (MMPI) 


GOWAN 5 
A B D \ E F G 
Minor I 
E 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE IV 
(Minor I) 


INTE RCORRELATIONS BETWEEN COOPERATIVE ENGLISH, AMERICAN COUNCIL PSYCHOLOGICAL, 
ALLPORT STUDY OF VALUES AND MINNESOTA MULTIPHASIC MIVENTORY”® 


Cooperative et al . C. E. Allport Values 
Al A2 AS A4 = AS Bi Bo Ci C2 C5 C6 


Code Voc Sp Lev M Eff 


Al 53.555 
A2 53. 62 
A3 46 56 
A4 -- 60 
AS -- 
A6 


Bl 
B2 
B3 


Cl 
C2 
C3 
C4 
C5 
C6 


Da 
Db 
De 
D1 
D2 
D3 
D4 
D6 
D7 
D8 
D9 
D10 
Dll 
D12 


*Decimals omitted throughout 


6 
P R 
29 19 78 65 22 -26 26 £404 «-11~=«-14 
42 48 78 i6 -21 22 £400 -05 -14 
34 34 65 18 #-20 18 -05 -14 
37 36 56 56 Ol -15 17 -13  -06 
37 39 61 61 10 #-11 #2414 -05 -08 
-- 68 44 61 «17 07 00 -04 
-- 48 78 09 03 -04 -02 1 04 
-- 92 23 «422 £400 -06 -15 
- 20 -16 OL -04 -14 -20 
-- -42 -28 20 -18 
aD 
| 


Minnesota Multiphasic Inventory 
D4 Dé Di Ds 
Pd Pa Pt Sc 


01 -04 -05 03 
-07 -07 -06 04 
-05 -05 -07 01 
-05 -O7 -02 00 
-04 -02 -06 
-03 -04 -08 


-08 -10 
-06 -04 
-09 -08 


-04 -06 
-08 
02 08 
08 06 
-07 -08 
09 06 


17 00 
ll 23 
23 18 
28 48 
19 52 
38 36 
31 42 
-- 36 


GOWAN 7 
L F K Hs D Ma Si Do Re St 
-16 00 -02 -09 03 -04 -10 06 23 07 14 
-23 -04 00 -08 -08 -10 -05 03 27 05 13 
-18 -04 02 -08 -06 -09 -10 04 22 07 10 
-06 -07 06 -06 -04 -02 -04 07 15 13 13 
-10 -05 04 -04 -07 00 -07 03 13 04 06 
_ -12 -08 08 -04 -08 -06 -05 -02 10 02 02 
: -11 -07 04 -04 -14 -07 -08 -05 -01 -05 08 01 09 
-20 -06 02 -10 -02 -08 -01 po 03 -04 00 24 14 18 
-02 01 -10 -08 -08 -08 -05 -01 -01 00 20 04 16 
-09 14 -02 -04 10 -03 08 01 -03 04 13 04 10 
-06 -03 -05 -04 -05 -12 -1l -11 01 02 -08 -17 -09 
00 15 -09 -02 14 04 00 10 00 08 08 04 14 
09 -02 09 05 06 06 08 10 -02 01 05 10 04 
: -08 00 -03 -06 -07 -06 00 -07 09 -12 11 -10 08 
11 -22 06 07 -12 02 -04 -04 -02 -02 -23 10 -23 
-- -08 46 24 03 33 11 00 -16 -12 03 47 08 
-- -26 09 28 06 18 30 12 24 -05 -18 -09 
-- 57 -10 51 39 47 -16 -44 31 52 22 
-- 31 +70 40 50 -02 -03 -03 12 -O1 
-- 22 32 26 -22 38 -25 -12 -18 
ae 45 40 -06 -20 10 25 10 
-- 52 16 -11 04 07 06 
36 -02 -05 06 10 00 
: -- 64 08 20 -28 -08 -14 
-- 20 -02 00 07 -02 
-- -20 -04 -28 10 
-- -32 -18 -43 
-- 39 52 
-- 19 


jnoysno1yy paytwo 


A&&as 


Dae 
aa 


° 
5 
> 
: 
> 


+(Id WW) SA'1VOS ONINOVAEL GNV 
‘AUOLN GANI TYOIDO'IOHOASd VINHOAITVD HLIM ANOLNGANI OISVHdLLTINW VLOSANNIN 
ANV SAN TVA AGNLS LUOdTTIV “TVOIDO'TOHOAS TIONNOD NVORIAWV 40 


(II 
AT@VL 


| 
q 
ONS 
aan SAARSS 
} 
ENN MMONTH BONE 
aw © 
| ko | 888 cesacessers 
| 
N wo N 
| | 882 
| 
| STM tnrnmwowo 
S82SSS 
N re} «© moon re} 
| fics | ses ssase2 geenecessss 
} 
} 
S52 
| New on wo 
| 
} 
| Ot non aonon 
or 
aa sec SSSS58 
ON SXSsss 
CAN 
NMAMO 
ao © am = 
| | 888 


aes 
saTeos ay} Jo 10g ST Q JO I Ue IOJ 10119 


9€ 
02 
Sb 
90 
9T 
LO 


és 


90- 
LO 
LS 
bP 
19 
bE 
Le 


bia 
eta 
| 
| 
64 
8a 
La 


™ 


ad 
ic 


(16% = N) 


HLIM AMOLNGANI TY OIDO'IOHOASd VINYOAITVD JO SNOLLY 


(II 
IA 


| 
GOWAN 9 
= 
ae | 
ia) 
o 
NaNO mon 
BOO 
' 
ON WwW vw oe 
ONWO 
ore | 
N Oo \ 
+S \ 
ae 
‘ 
aA | & 
Bo | SAGER 
oa 
Sa 
oO 


aAT}ISOg 
saTeog stsous01g 


suol}eley Teuosiad 
ssoumpysnoyL 
ssoul[puslig 

Teuonow gy 
AZ19uq 

IOWIUWITZ 


(zx) 

(IX) 
oney julod 10u0H 
aoueurmog 

SN}EIS 

AVITIQIXe TA 

Te1s0g 
poor 
Aduanbazjuy 


Zz 
5 
5 
Q 
a 
fal 
fa 
° 
4 
8 


puke 


TINS AYOLNAANI TY OIDO'TOHOASd 
SNSUAA SA TV IS SISONDOUd ONIHOVAL UNV NAIMLAG SNOLLV 


{AI 
NA 


10 
=Sseas 
£0 
Qc 
: < No 
O 
& 


TABLE VII 
(Factor Analysis I) 


UNROTATED FACTOR LOADINGS FOR MATRIX OF VARIABLES 1 THROUGH 30* 


Independent Variable I h? 


Cooperative English Vocabulary 75 73 
Cooperative English Speed 85 74 
Cooperative English Level 76 62 
Cooperative English Mechanics 69 14 52 
Cooperative English Effectiveness 73 05 54 
Stanford Arithmetic 63 04 58 


Quantitative 66 04 63 
Linguistic 89 -06 82 
Total 93 -05 88 


Theoretical 20 -49 30 
Economic -18 -19 42 
Aesthetic 17 09 -16 34 
Social 02 02 22 07 
Political -07 -13 -38 30 
Religious -13 01 66 47 


Lie -20 -14 19 43 
Falsification -06 38 8-35 29 
Suppressor Variable — 04 -18 19 74 
Hypochonriasis -09 34 «08 58 
Depression -08 63 -15 36 
Hysteria -08 17 04 60 
Psychopathic Deviate -05 32-12 20 847 
Paranoia -08 19 05 02 24 
Psychasthenia -08 67 03 11 =62 
Schizophrenia 00 49 -07 
Hypomania -07 00 -12 57 35 
Social Introversion 02 56 00 -35 57 
Dominance 22 33-51 = -31 -04 53 
Social Responsibility 08 47 -33 «14 -39 §2 
Status 15 32 -49 -29 10 48 


D10 
Dil 
D12 
D13 


Q: 
L: 
= 
= 
E: 
A: 
S: 
P: 
R: 
L: 
F: 
K: 
Hs: 
D: 
Pd: 
Pa: 
Pt: 
Se: 
Ma: 
Si: 
Do: 
Re: 
St: 


*Decimals are omitted throughout. For further identification of variables, see Table I. 


GOWAN 11 
\ 
Al 
A2 
A3 
A4 
A5 
A6 
Bl 
B3 
Cl 
C2 
C3 
c4 
C5 
C6 
Da 
Db 
De 
D1 

D2 
D3 
D4 
D6 
D7 
D8 
D9 


12 JOURNAL OF EXPERIMENTAL EDUCATION 


ate amounts of ‘‘K’’ may be desirable for inte- 
grated ego functioning. The factor is designated 
as ego-sensitivity. 

Factor III is pretty well described asa ‘‘hope- 
lessness’’ indicator. It has its highest loadings 
on D, Pt and Si. There is social withdrawal 
here also, as indicated by the positive 1 oadings 
on Sc and Si, and the negative loading on Do. 

Factor IV is an Allport factor designated here 
as ‘‘mystical’’ because of its high religious load- 
ing, but also because of the negative loadings on 
theoretical and dominance. Notice that persons 
described by this factor do not withdraw although 
they may not seek to lead, and that they do not 
falsify although they may tend to be subjective. 

Factor V represents the pole of verbal-artis- 
tic versus mathematical-practical. The guess 
is hazarded that this factor would show consider- 
able sex difference. The largest variances 
come on the Allport, but there are important 
loadings on Vocabulary and ‘‘Q’’. 

Factor VI seems to be less well defined than 
the others. It appears to represent manic ten- 
dencies, especially those connected with not fol- 
lowing through on a job. It is designated as 
‘*manic irresponsibility’’. 


Factor Analysis II is detailedin Table IX 
From a matrix of 20 variables selected from the 
California Psychological Inventory and the Guil- 
ford-Zimmerman Temperament Survey, three 
factors were extracted into oblique simple struc- 
ture. The rotated factors might be designated 
as: I, General Teaching Adjustment; Il, Thought- 
fulness or anti-delinquency; and III, General En- 
ergy. The rather considerable loadings on Fac- 
tor I round out the description of what teaching 
adjustment as measured by the Teacher Progno- 
sis Scale Gi represents. 

Factor Analysis III is an attempt to combine 
some of the leading variables of IandII. Table 
X details the loadings of the unrotated and ro- 
tated three factors extracted from ten selected 
variables. Again Factor I seems to be General 
Teaching Adjustment, Factor II General Energy, 
and Factor III related to Status, Poise or Flex- 
ibility. Reversing Factors II andIII between 
this and the last factor analysis, it appears that 
the same factor space is described. 

In Table XI, there is a short but interesting 
digression with regard to Factor Analysis I. 
The first three variables, which account for 
most of the variance, have been normalized and 
plotted on a sphere, the positive hemisphere of 
which is shown in Figure 1. It appears that while 
the A.C.E. variable is at one pole of the sphere, 
the other two poles represent something like ‘‘K’’ 
and anti-depression. The MMPI scores then 
seem to array themselves on a band ornarrow 
width, nearly in the plane of the equator (or great 
circle) of the A.C.E. pole. They can, thus, be 


expressed by a single parameter angle, represent- 
ing their deviation from an ideal pole ‘‘anti- de- 
pression’’. Rotation, hence, seems unnecessary. 
Tne concept of the different scales of the MMPI 
being spread out in this fashion introduces some 
very interesting speculations and possibilities | 
which only further research can verify or dis- 
prove. These facts should be considered, of 
course, in the light of the amount of variance that 
any particular scale has with respect to these 
three factors. The fact that paranoia and psycho- 
pathic deviate are nearly together does not indi- 
cate of itself that they represent the same thing, 
since only 23 percent of the variance of the first 
and 41 percent of the variance of the second is 
involved. Nevertheless, the relationships be- 
tween the MMPI scales are certainly rather 
graphically revealed in at least some of their di- 
mensions by Figures 1 and 2. 


Samples Two and Three 


1. Introduction 


Samples Two and Three consist of much smal- 
ler populations and far fewer test scales. They 
are included chiefly because of the confirmation 
of the resulting factor analyses with some of the 
factor space and location of the vectors of the pre- 
vious analyses. It is considered significant that 
different factor analyses done with different tests 
and on different populations should turn up simi- 
lar results. 

The populations used in these samples were 
Education juniors at Los Angeles State College. 
The number of cases included in Sample Two was 
110 and in Sample Three 86. There was no over- 
lapping of personnel between the samples. The 
IBM equipment was not used to secure the inter- 
correlations, but the method of Flanagan, previ- 
ously mentioned, was employed. 


2. Description of the Variables of Sample Two 


Eight variables were used in Sample Two: 1) 
a rating on authoritarianism, using a modified 
Adorno ‘‘F’’ scale, 2) a rating on-socio-econ- 
omic status of parents of the type used by Sims 
(ratings were made by respondents t he mselves), 
3) a sample of the Tp scale containing about half 
the items, 4) a sample of the Sc scale containing 
about half the items, 5) a sample of the Tnscale 
containing about half the items, 6) a sample of 
the D scale containing about half the items, 7) the 
Minnesota Teacher Attitude Inventory score, and 
8) intelligence as measured by the Army General. 
Classification Test. Intercorrelations of these 
variables are shown in Table XII. 


3. Results of the Factor Analysis 
Three factors were extracted. Table XIII 


TABLE Ix 
(Factor Analysis II) 


FACTOR LOADINGS FOR MATRIX OF CERTAIN VARIABLES IN E, F, G SERIES* 


Unrotated 


Gi: Good Impression -55 72 76 -50 
Ds: Dissimulation 20 66 -87 -05 
Re: Social Responsibility -18 41 30 
Fl: Flexibility 06 33 30 70 


Status 44 55 78 58 
Dominance 50 52 68 20 


Social Participation 53 68 75 26 
Delinquency 32 25 -36 -93 
Intellectual Efficiency 29 60 90 33 
Academic Achievement -12 42 04 -35 
Poise, Self-confidence 74 80 40 84 
Impulsivity 74 80 -47 07 


St: 

Do: 
Sp: 
De: 


NARS 


General Energy 34 20 25 05 
Ascendance 39 38 76 20 
Stability -31 50 89 -46 
Friendliness -45 70 65 -03 
Thoughtfulness 26 21 10 
Personal Relations -27 50 85 -06 


Gl Teacher Positive 72 52 99 -O1 
G2 Tn: Teacher Negative -63 10 40 -98 05 


* Decimals omitted throughout. For further identification of the variables, see Table I. 
**Correlations between the rotated factors: I and II, .00; I and III, .00; Il and II, . 50. 


GOWAN 13 
Rotated 
Eb -31 
Ec 46 
El -83 
E3 -83 
E5 60 
E6 45 
E8 -30 
E9 00 
E10 -10 . 
E14 00 
E15 30 
Fl 88 
F3 42 
F5 -10 
F7 -75 
F8 40 
F9 -47 
-09 
00 


TABLE X 
(Factor Analysis III) 


JOURNAL OF EXPERIMENTAL EDUCATION 


FACTOR LOADINGS FOR MATRIX OF TEN SELECTED VARIABLES* 


Independent Variable 


Unrotated 


I 


II 


Il 


Rotated** 


I 


II 


I 


American Council Psychological Total 
De _iK: Suppressor Variable on MMPI 
D7 Pt: Psychasthenia on MMPI 

Eb Gi: Good Impression on CPI 

E3 Fi: Flexibility on CPI 

E4 St: Status on CPI 

E14 Xl: Poise, Self-confidence on CPI 
Fl G: General Energy on G-Z 

Fi F: Friendliness 

Teacher Positive on MMPI 


15 
74 


-07 


60 
51 
61 
50 


-01 


66 
67 


-04 
-06 
42 
-39 
57 
14 
12 
-35 
-08 
-43 


73 
58 
56 
72 
29 
58 
63 


65 
75 
-67 
75 
02 
55 
42 
35 
73 
99 


10 
-55 
-98 
-30 
-92 
-30 
-05 

85 
-60 
-01 


60 

10 

10 
-25 
72 
82 
80 
00 
05 
01 


TABLE XI 
(Detail of Factor Analysis I) 


NORMALIZATION OF FIRST THREE FACTORS FOR CERTAIN VARIABLES 
OF FACTOR ANALYSIS I* 


* Decimalsare omittedthroughout. For further identification of variable, see Table I. 
**Correlations between the rotated factors: I and II, .00; I and ID, .00; II and III 


, «50. 


Independent Variable I 


Ill 


h? 


I 


Il 


Il 


Beta** 


B3 American Council Psychological Total 93 


6 Suppressor Variable on MMPI 04 
Dl MHypochondriasis -09 
D2 Depression -08 
D3 Hy: Hysteria -08 
D4 Pd: Psychopathic Deviate -05 
D6 Pa: Paranoia -08 
D7 Pt: Psychastenia -08 
D8 Sc: Schizophrenia 00 
D10 =Si: Social Introversion 02 
Dil Do: Dominance 22 
Di2 Re: Social Responsibility 08 
St: Status 15 


-01 
18 
34 
63 
17 
32 
19 
67 
49 
56 

-51 

-33 

-49 


86 
67 
55 
42 
57 
41 
23 
60 
60 
43 
41 
33 
36 


99 
05 
-12 
-12 
-10 
-08 


-16 


-10 
00 
03 

34 

14 

25 


-02 
98 
89 
20 
98 
88 
91 
50 
77 

-54 
52 
82 
53 


-01 
22 
45 
97 
22 
50 
40 
86 
63 
86 

-80 

-58 

-82 


80° 
115° 
168° 
100° 
120° 
113° 
150° 
130° 
210° 
33° 
58° 
32° 


and 2.) 


= cos (B 18 jo), 


agj, then if (a};)? 
other works, 


* Decimals are omitted throughout. For further identification of the variable, see Table I. 
**If the normalized loadings for the three factors are represented by ajj, 42), 
= 0, approximately (or is less than .1), ag; = sin B, and a3; 
B is the parameter angle along which the MMPI scales seem to be p ino y (See Figures 1 


14 
i 
17 05 
-33 66 
-36 31 
-47 
-04 
42 
68 
41 
-38 
06 
02 
80 
66 
13 
74 
56 
44 
39 
60 
-35 
33 
47 
32 


Figure L 
Diagram of Desitive™ Hernisohere MMPI 


| | GOWAN 15 
| 
I Pe. 
Hs Pa 
Hy 
| 
K 
Re 
St 
| | 


JOURNAL OF EXPERIMENTAL EDUCATION 


Showing Parameter Beta in 
Relation Scales 


16 
180° | 
210° 168° 
Si D 150. 
130 
Se 
4 py 
Hs 
Pa 1113. 
Hy 100 
: y9o° 
K 
Ie 
St 
| 


TABLE XII 
INTE RCORRE LATIONS OF VARIABLES OF SAMPLE TWO (LOS ANGELES STATE COLLEGE)* 


Code and Scale 


Hl Authoritarianism from Modified Adorno Scale -41 
Il Socio-economic Status of Parents (Sims type) -14 
Gl Tp: Teacher Positive (MMPI)** i 33 
D8 Se: Schizophrenia (MMPI)** : -24 

Tn: Teacher Negative (MMPI)** : -42 
D2 «SOD: Depression (MMPI)** -30 
Jl Minnesota Teacher Attitude Inventory (MTAI) 54 
Kl _ Intelligence: Army General Classification Test 


* Decimals are omitted throughout. For identification of the scales, consult Table I and introductory 


context to Table XII 
**Represents only a sample of the scale named. 


TABLE 
(Factor Analysis 'V) 


FACTOR LOADINGS FOR MATRIX OF TABLE XII (LOS ANGELES STATE COLLEGE)* 


‘Unrotated Rotated 
Independent Variable 0 wmsiwh 


Hl  Authoritarianism (Modified Adorno) 79 -17 71 

Il Socio-economic Status of Parents 16 53 42 

Gl Tp: Teacher Positive (MMPI) 13-08 51 25 
D8 Sec: Schizophrenia (MMPI) 70 -63 30 98 -74 
G2 Tn: Teacher Negative (MMPI) 79 «6-09 46€600~—Cls«*&G 20 
D2. «OD: Depression (MMPI) . 53 -03 -03 28 -05 
Ji Minnesota Teacher Attitude Inventory -65 -50 -06 67 80 -37 
K1 Intelligence: (AGCT) 58 -40 -49 £74 65 -10 


* Decimals are omittedthroughout. For further identification of variables refer to Table I or 
Table XII, 


**Correlations between the rotated factors: I and II, -.09; I and I, -.09; Il and IM, .03. 


GOWAN 17 
Il Gl D6 G2 D2 A Kl 
a 
10 
90 
10 
00 
-15 
-12 
-24 
-65 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE XIV 


INTE RCORRE LATIONS OF VARIABLES OF SAMPLE THREE (LOS ANGELES STATE COLLEGE)* 


Code and Scale 


Gl 


G2 Hil 


Bell Adjustment: Home 56 44 59 31 #-19 13 36 -46 -13 


L2 Bell Adjustment: Health 30 68 -02 -03 -02 38 -51 07 
L3__— Bell Adjustment: Social 62 20 -07 -03 49 -43 -24 
Bell Adjustment: Emotional 33° -24 24 42 -37 -32 
L5 Bell Adjustment: Vocational -21 22 30 -43 -35 
Ml RAPH: Rigidity -73 -37 425 30 
Jl MTAI: Teacher Attitude 60 -47 -30 
Gl Tp: Teacher Positive (MMPI) -47 -52 
Gz Tn: Teacher Negative (MMPI) 42 


*Decimals are omitted throughout. For identification of the scales, consult Table I and introductory 
context to Table XIV. Maximum standard error of R is .11. 


TABLE XV 
(Factor Analysis V) 


FACTOR LOADINGS FOR MATRIX OF TABLE XIV (LOS ANGELES STATE COLLEGE)* 


Unrotated 
Independent Variable I wh? I Il 


Bell Adjustment: Home 65 -36 11 56 -03 -07 88 


L2 Bell Adjustment: Health 53 -53 42 74 00 26 80 
L3 ‘Bell Adjustment: Social 60 -41 -35 65 -55 -09 91 
L4__— Bell Adjustment: Emotional 77 -40 -22 #480 -10 -34 95 
L5 Bell Adjustment: Vocational 47 13 -31 34 -18 -77 40 
Ml RAPH: Rigidity -52 -54 56 -63 94 £07 
Jl MTAI: Teacher Attitude 56 65 26 80 78 -56 07 
Gl Tp: Teacher Positive (MMPI) 77 19 -01 £63 76 OO 400 
G2 Tn: Teacher Negative (MMPI) -75 O03 -26 £62 -39 59 -40 


Authoritarianism (Adorno) -49 -34 33 46 00 8: -15 


* Decimals omitted throughout. For identification of variables see Table XIV. 
**Correlations between the rotated factors: I andl, .47; land I, .53; Il and -. 


18 
! 


IO IO} S9}JEUTPIOOD JO & JUaSaIdaI 

94} preMo},,) Avr SuoTe 0} pounsse st dy sty} ul saysue ay} 

*papn[oxe st 

UT IOJ payUNOdoeUN ‘ATUO adeds STY} UT Jo pa} 

-OU 3q 0} ST “JTeY JAMO] UT do} ay} UT Ud} 


00S OFT 0082 082 (ZD) La 
006 0092 (Id 9D) uotssaiduy poop 1) 
o0bT «(WV ‘Hd VU ‘IdD) 
o0T (LODV ‘AOy) 
(Idd) 1X 


puke apod 


€ FUND AO AOVdS YOLOVA FHL NI 
SATAVIVA NIVLYU FS NAAIML AT SNOLLVIAGC UV'INONV GNV SNOLLV LNI GALVWLLSI 


IAX 


GOWAN 19 

| 
<a 
Ok 
33 
= M 
« 
@ 
a 


JOURNAL OF EXPERIMENTAL EDUCATION 


Figure 
® means antipodies of the vector point of contact with sphere. 
Lines connect substantially, similar measures.) 


20 
0° 60° Bell 
% “i 
Pi 
Ds, 4, 
Authori Dy, 
Auth fe, he, 
hy 
| 
ve 
| 270° 


GOWAN 21 


shows the unrotated and rotated loadings for ob- 
lique simple structure. The first factor seems 
to be General Teaching Adjustment, the second 

Authoritarianism and the third related to Status. 


4. Description of the Variables of Sample Three 


The ten variables of sample three were four 
of those used in Sample Two and six new ones. 
The first five consisted of the home, health, so- 
cial, emotional and vocational scales of the Bell 
Adjustment Inventory (Adult Form). These 
scales were reversed so that a ‘‘low’’ score was 
considered high in desirability in interpretation; 
in other words, the scales were turned so that 
the desirable social result was ‘‘up’’. In conse- 
quence, these scales would be expected to corre- 
late positively with a scale of adjustment and neg- 
atively with a scale of maladjustment. The sixth 
measure was the so-called RAPH scale, ameas- 
ure of rigidity of attitudes regarding personal 
habits developed by Meresko (8). The last four 
scales were the Minnesota Teacher Attitude In- 
ventory, the Tp and Tn scales for the MMPI, 
and the Adorno-type authoritarianism scale used 
in Sample Two. The intercorrelations of these 
variables are shown in Table XIV. 


5. Factor Analysis Results 


Three factors were extracted. Table XV 
shows the unrotated and rotated loadings for ob- 
lique simple structure. The first factor again 
seems to be General Teaching Adjustment, and 
the second again Authoritarianism. The third 
seems to be most closely related to the Bell Emo- 
tional Scale. 


6. Collated Results 


When figures were drawn to represent the 
several factor analyses and were compared, it 
appeared that most of them displayed com mon 
factor space. It is recognized that much of the 
variance of the variables cannot be expressed in 
three dimensions, yet there seemed to be enough 
of a ‘‘common view’’ to make it worthwhilé to su- 
perimpose the diagrams. This‘has been done in 
Figure 3, which shows the vector intersections 
of selected scales with the positive hemisphere 
pooling the results of a number of different fac- 
tor analyses. The Tp point has been used to lo- 
cate the center pole, and the authoritarian pole 
is at the extreme left, so that the vertical axis 
is its great circle. Lines have been drawn be- 
tween variables which purport to measure the 
same thing. Subscripts indicate which factor 
analysis was involved. It will be noted that 
there is considerable uniformity in the position 
of the points, even as between different factor 
analyses. The angle between authoritarianism 


and intelligence, for example, seems to be about 
135 degrees. The 30 degree and 60 degree small 
circles have been drawn, and various areas of the 
circumference great circle have been named. 
From Figure 3, Table XVI has been construct- 
ed. This table gives in very rough form estima- 
tions of the correlations between selected clusters 
noted in Figure 3. It also gives the angular devi- 
ation measured from the pole between these clus- 
ters. Such an arrangement provides a kind of 
circular coordinate system. It is to be empha- 
sized that measurements are rough only and are 
therefore inexact. The arrangement of these vec- 
tors in the factor space is perhaps made more 
understandable by such a procedure. At least, 
their relationships to each other in the common 
factor space becomes more apparent. It is the 
contention of the writers that the existence of this 
common factor space as revealed by several fac- 
tor analyses of different tests on different popula- 
tions helps to further understanding with regard 
to the interrelationships between these variables. 
The writers believe further interpretation 
should await corroboration by others and further 
exploration. They are aware of the rough and of- 
ten informal methods utilized with some of the da- 
ta and of the incompleteness of many of its parts. 
It was Thurstone himself who said, in this regard: 


The exploratory nature of factor anal- 
ysis is not often understood. Factor an- 
alysis has its principal usefulness at the 
borderline of science.... These new 
methods have a humble role. They en- 
able us only to make the crudest first 
map of anew domain. But if we have sci- 
entific intuition and sufficient ingenuity, 
the rough factorial map of the new domain 
will enable us to proceed beyond..... 
(11:56) 


It is in this light that these explorations are of- 
fered. 


Summary 


This paper reports the intercorrelations and 
resulting factor analyses from giving extensive 
testing batteries to teaching candidates. The ma- 
jor work sample was at UCLA, involving numbers 
ranging upwards to 1700 subjects and scales on 
Cooperative English, Stanford Arithmetic, Amer- 
ican Council Psychological, Allport Study of Val- 
ues, Minnesota Multiphasic, California Psycho- 
logical, Guilford-Zimmerman Temperament, 
and two scales on Teaching Prognosis devised by 
the writers. The minor work samples included 
two groups of about 100 subjects at Los Angeles 
State College, involving intelligence, status, au- 
thoritarianism, Minnesota Teacher Attitude In - 
ventory, Bell Adjustment Inventory, and the pre- 


22 JOURNAL OF EXPERIMENTAL EDUCATION 


viously mentioned teaching scales. 

Results of the factor analyses seemed to show 
a common factor space, and helped to clarify the 
relation of other generally used variables to this 
measure of teaching potential. Further investi- 
gations appear in order. 


REFERENCES 


1. California Teachers’ Association. ‘‘Minutes 
of the Third Annual State Conference on Ed- 
ucational Research, ’’ California Teachers’ 
Association Research Bulletin, 

(December 1951), p. 211. 


2. Flanagan, J. C. ‘‘The Effectiveness of Short 
Methods of Calculating Correlation C o effi- 
cients, Psychological Bulletin, XLIX (Ju- 
ly 1952), pp. 342-48. 


3. Gough, H. C. A Preliminary Guide for the 
Use and Interpretation of the California 
Psychological Inventory, mimeographed, 
Institute of Personality Assessment and 
Research, University of California, Berke- 
ley, California, 1954. 


4. Gough, H. C., and Pemberton, W. H. ‘‘Per- 
sonality Characteristics Related to Success 


in Student Teaching, ’’ Journal of Applied 
Psychology, X (October 1952), p. 307. 


5. Gowan, J. C., and Gowan, MaySeagoe. A 


Teacher Prognosis Scale for the MMPI, 
mimeographed, Department of E ducation, 
University of California, Los Anglese, Cal- 
ifornia, 1954. 


6. Gowan, J. C. ‘‘A Comparison of Women 
Graduates in Elementary Education with 
Teachers Holding Emergency Credentials,’ 
Educational Horizons, XXXII (Winter 
1953), pp. 134-38. 


7. MacLean, M. S., andothers. ‘‘A Teacher 
Selection and Counseling Service, ’’ Journal 
of Educational Research, XLVIII (May 1955), 
pp. 669-78. 


- Meresko, R., and others. ‘‘Rigidity of Atti- 
tudes Regarding Personal Habits and Its 
Ideological Correlates,’’ Journal of Ab- 


normal and Social Ps chology XLIX (Jan- 
1984), pp- 89-94. 


. Ryans, D. G. ‘‘Teacher Personnel Research: 
I. Considerations Relative to Research De- 
sign, ’’ California Journal of Educational 
Research, IV (January 1953), pp. 19-27. 


10. Seagoe, May V. ‘‘Prognostic Tests and Teach- 
ing Success,’’ Journal of Educational Re- 
search, XXXVIII (May 1945), pp. 685-90. 


11. Thurstone, L. L. Multiple Factor Analysis 
(Chicago: University of Chicago Press, 
1947). 


4 


JOURNAL OF EXPERIMENTAL EDUCATION 
(Volume 27, September 1958) 


APPLICATION OF ANALYSIS OF VARIANCE 
TO THE ESTIMATION OF THE RELIABIL- 


ITY OF OBSERVATIONS OF TEACHERS’ 


CLASSROOM BEHAVIOR 


DONALD M. MEDLEY and HAROLD E. MITZEL 


Division of Teacher Education 
Municipal Colleges of New York City 


MANY EDUCATORS and psychologists have 
come to believe that the efficient way to esti- 
mate the internal consistency of a measuring in- 
strument is to divide it into halves, score each 
half separately, correlate the pairs of scores so 
obtained by means of the Pearson product-mo- 
ment coefficient, and correct this value for the 
fact that it is based on halves instead of wholes. 
This procedure was developed independently by 
Spearman (13) and by Brown (2) for the solution 
of certain test construction problems. Itis also. 
widely believed that the efficient way to esti- 
mate the stability of an instrument is to adminis- 
ter equivalent forms to a sample of subjects and 
correlate the two sets of scores so obtained. It 
is the purpose of this paper to suggest the effi- 
ciency of the analysis of variance technique for 
estimating the reliability of educational meas- 
ures, and to illustrate use of the technique on ob- 
servations of teachers’ classroom behavior. 

The analysis of variance has been suggested 
as a method for estimating test reliability by 
Jackson (6,7), Hoyt (5) and Alexander (1). Its 
use for estimating the reliability of grades as- 
signed to compositions or essay examinations 
was described by Pilliner (11). Lindquist (8: 
357-82) has recently presented a rather com- 
plete discussion of the general use of the anal y- 
sis of variance technique for reliability est ima- 
tion in educational and psychological measur e- 
ment. The method is particularly well adapt- 
ed to observational data, as Lindquist remarks, 
but concrete examples of its proper use are not 
available in the literature. 

In connection with a longitudinal study of 
teacher education graduates of the New York 
City municipal colleges (14), the reliability of 
two observational techniques for assessing 
teachers’ classroom behaviors was studied. 
The method of estimating reliability used in 
this longitudinal study will be described briefly, 
and its application to the classroom observation 
data will be illustrated. Methods of computa- 


tion will not be given since they are readily acces- 
sible in other sources; emyhawsis will be on the 
logic of the procedure and the interpretation of re- 
sults. 


Method of Analysis and Definition of Terms 


Suppose that N teachers are visited m times 
each by ateam of n observers. Each teacher will 
be assigned a score on the dimension of interest 
by each observer on each visit, yielding a total 
of mn scores per teacher and a grand total of 
Nmn observations for analysis. 

Among the factors which may be expected to 
produce variation among the scores are two: dif- 
ferences among teachers and differences among 
visits. For convenience, Tj will be used to rep- 
resent the deviation (from the mean of all obser- 
vations) associated with Teacher i, and Vj willbe 
used to represent the deviation associated with 
Visit j. It is understood that Tj will be the same 
for Teacher i on every visit and Vj will be the 
same for all teachers on the jth visit to each of 
them. 

If Pjj is the performance of Teacher i on visit 
j, it is probable that 


Pij # Ti + Vj 


In other words, there is likely to be an “‘inter- 
action’’ between visits and teachers—some teach- 
ers may do better on the first visit than on any 
other; other teachers may do better on the last 
visit, etc. Therefore, let 


Ijj = Pij - (Ti + Vj); (1) 


Ij is the interaction component of a teacher’s 
score on a particular occasion. 

When a particular observer k visits a particu- 
lar teacher i on a particular occasion j, the 
score Xjjk (taken as a deviation from the mean 


of all values of Xjjk) that that observer assigns 


24 JOURNAL OF EXPERIMENTAL EDUCATION 


to the teacher may not be identically equal to 
Pij; the actual performance of that teacher on 
that visit. Define 


eijk = Xijk - Pij 
= Xijk Tj Vj lij (2) 


The ‘‘error’’ ejjk will include all parts of the 
score not otherwise accounted for in equation 
(2). 

I\j will be referred to as the visit error for 
teacher i on visit j; ejjk as the residval or ob- 
server error for teacher i on visit j in observa- 
tion k. Error is thus partitioned into two parts 
—one containing errors due to lack of stability 
in teacher performance, and the other contain- 
ing all errors independent of such lack of stabil- 
ity. (The latter component is referred to as 
‘“‘observer’’ error because it will show up inthe 
discrepancy between two records of the same 
performance made by two different observers.) 

If (2) is rewritten as follows: 


Xijk = Ti + Vj + lij + eijx, 


and if both sides are squared and the operation 
of taking mathematical expectations in the popu- 
lation (generated as N, m, and n all approach 
infinity) is performed, the result may be writ- 
ten: 


Ox? = O47 + Oy? + Oty? + 0%, (3) 


where 0 ? is the total variance for all observa- 
tions X, o¢? is the variance of the Ti, oy? of the 
Vj; Oty* of the Ijj, and o? of the ejjk, in their 
respective populations. 

What is meant by the ‘‘reliability’’ of ascale 
depends on what true score is of interest, since 
the error in a score is the difference betweenit 
and the true score it estimates. As will be 
shown, the reliability of a scale as a measure 
of Pjj will generally be greater than its reliability 
as a measure of Tj. Inthepresentinstance, the 
true score of interest is Tj, the mean of all per- 
formances Pjj of teacher ionall occasions j on 
which a visit might be made to the teacher. Ideally, 
the population of visits j should include all possible 
situations that arise during a teacher’s career. 
More realistically, it should include all situa- 
tions during a particular school year or term; 
this could be approximated by use of proper 
sampling procedures in selecting the times at 
which observations are made. Then Tj would 
represent the ‘‘typical’’ performance of Teach- 
er i. 

Similarly, the nature of the population of 
teachers is not clearly perceived unless the 
teachers observed are drawn at random from a 
specified population of teachers. It will be as- 
sumed, however, that both populations do exist, 


whether or not they can be specified. 
The reliability of a score based on a single ob- 
servation may then be defined as 


R = 0¢?/ (o¢? + Oty? + 0?) (4) 


Tr> numerator on the right is the variance of 
Tj; that is, the ‘“‘true score’’ variance. The de- 
nominator is the sum of the true score variances 
and the two error variances. Comparison with 
equation (3) reveals that this sum represents the 
total variance of the scores with the component 
due to visit differences, oy*, removed. This var- 
iance is removed because we will compare teach- 
ers who have all been visited equally often, and 
the scores will be means over all visits so that 
the visit effects are cancelled out. The denomin- 
ator in equation (4) is the total variance of these 
scores about their mean. The reliability coeffi- 
cient is thus seen to be the ratio of true score 
variance to total variance, or, inother words, the 
proportion of the total variance attributable to dif- 
ferences among teachers. 

This reliability coefficient R is the parameter 
that is usually estimated by correlating scores as- 
signed to a set of teachers by two observers vis- 
iting the teachers at different times. This meth- 
od of estimating reliability is quite unsatisfactory, 
however, since only two scores per teacher can 
be used, with the result that the estimate has 
very low precision when the number of teachers 
is small. 

A second type of ‘‘reliability’’ coefficient that 
is sometimes used regards Pjj, the true perform- 
ance of teacher i on visit j, as the true score to 
be estimated. This coefficient will be referred 
to here as the coefficient of observer agreement, 
R', and may be defined as toliows: 


R' = (04? + + + 07) (5) 


In this case, fluctuations in teacher perfor m- 
ance are regarded as part of true score variance, 
since they are capable of being observed by all ob- 
servers present on a particular occasion. This 
coefficient may also be estimated by correlating 
scores assigned to a group of teachers by two ob- 
servers visiting each of them at the same time; 
it is a measure of observers’ ability to agree in 
their records of the same performance. R' does 
not indicate how reliably the teachers are dis- 
criminated, and therefore should not be referred 
to as a reliability coefficient. 

The reliability of the méan of a number of 
scores assigned to the same teacher is easily de- 
rived. In terms of the observer team size n, and 
the number of visits, m, it is Rmn, where 


Rmn = (mno;?)/(mnoz? + + 0?) (6) 


If it is assumed that Tj, Vj; Ijj, and ejjk are 


MEDLEY - MITZEL 


normally and independently distributed in repeat- 
ed random sampling with zero means and vari- 
ances 0¢?, dy”, Oty? and o?, respectively, then 
the values of these variance components and 
hence of the coefficients R and R' may be esti- 
mated from an analysis of variance table of the 
form shown in Table I. 

Table I is based on samples of N teachers, 
m visits, and teams of n observers each. The 
observed mean squares and their expectations 
in terms of variance components are shown at 
the right. Estimates of the variance components 
of interest may be obtained from the obser ved 
mean squares as follows (the symbol ‘‘(=)’’ may 
be read, ‘‘is estimated by’’): 


(7) 
Oty? (=) (Sty? - 5?) /n 
(=) (542 Sty?) / mn 


By substituting these estimates and the ap- 
propriate values of m and n in equations (4) to 
(6), estimates of the coefficients of reliability 
and of observer agreement secured in a given 
experiment may be obtained. 

It is also possible to test the hypotheses, 


Ho: o4? = 0 


H,: Oty? =0 


and 


Hypothesis Hg states that the scale fails to 
discriminate among teachers; hypothesis H, 
states that there is, on the average, no greater 
variation between two records based on differ- 
ent visits than between two records based on the 
same visit. 

H, is tested by comparing 


F, = Sty? / s? 


with Snedecor’s F distribution (10:222-225) with 
degrees of freedom n, = (N - 1)(n - 1). and n2,= 
Nm(n- 1). If H, is accepted, it is concluded 
that oty? = 0, and Table I is superseded by an 

analysis of the form shown in Table Il. For oty? 
in equation (3) to (6) a zero is substituted, and 
the estimation equations for the variance com- 
ponents become: 


o2 (=) Se? 
ot? (=) (st? - Se?) / mn (8) 


Since oty* = 0, Hg is tested by comparing 


Fo = St? / Se” 


with the tables of the\F distribution with degrees 
of freedom n, = (N - 1) and nz= N(mn- 1)-(m - 1). 

If H, is rejected, it is concluded that o¢y?>0, 
and Hog is tested by 


F, = st? / Sty” 


with the F tables with degrees of freedom n, = 
(N - 1) and nz = (N - 1)(m - 1). 

If Ho is accepted, it is best to conclude that 
the reliability of the scale is zero; if Hg is reject- 
ed, it is proper to estimate R as indicated in equa- 
tions (4) and (7). 


Application of the Design to Tryouts of the 
Cornell Technique 


The first of the two techniques employed in this 
study was developed by Francis G. Cornell and 
his associates at the University of Illinois. For 
the purposes of this investigation, Cornell’s tech- 
nique was modified slightly; readers interested 
in the original form should consult the monograph 
in which it was originally presented (3); the modi- 
fied form is described in a monograph by Medley 
and Mitzel (9). 

Six observers participated in the tryouts. They 
visited 33 teachers in teams of two observers 
each. Each of the six observers saw each of the 
33 teachers once, so that the total number of 
scores on each of the eight dimensions was 198. 
The six observers were grouped into one set of 
three teams and the first eleven teachers were 
visited by each team, no two teams visiting the 
same teacher on the same day. The six observ- 
ers were then rearranged into a different set 
of three teams, and eleven more teachers were 
visited. Finally, the team composition was 
changed a third time and the remaining eleven 
teachers were visited. 

The simplest way of analyzing these data is to 
regard each series of visits to eleven teachers as 
a distinct tryout. In this case,N=11, m=3, and 
n=2, The design in Table I could be used to an- 
alyze the results of each tryout separately. 

If it is assumed that all 33 teachers may be re- 
garded as having been drawn at random from the 
same population of teachers, and that all of the 
nine teams used may be regarded as having been 
drawn at random from the same population of 
teams, then it is reasonable to expect that the cor- 
responding observed mean squares in different 
tryouts estimate parameters of the same popula- 
tion, and the respective sums of squares may be 
pooled to yield more precise estimates of the par- 
ameters. Since there are 198 scoresinall, yield- 
ing a total of 197 degrees of freedom, and since 
each tryout employs 66 scores, yielding 65 de- 
grees of freedom per tryout, or a total of 195, 
there are two degrees of freedom remaining. 
These two degrees of freedom may (under the as- 
sumption stated) be used to estimate the ‘‘teach- 
er’’ mean square from differences between 
groups of teachers, making a total of 32 degrees 
of freedom available for this purpose. The com- 
plete design is shown in Table III. 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE I 


PLAN FOR RELIABILITY ANALYSIS OF VARIANCE OF 
SCORES ON A BEHAVIORAL DIMENSION 


Mean Squares 


Variation Observed Expected 


Teachers o? + + 
Visits o? + + Nnoys 
Visit Error (N - 1)(m - 1) ; 02 + notya 


Observer Error Nm (n - 1) o? 


Total Nmn - 1 


TABLE II 


PLAN FOR RELIABILITY ANALYSIS OF SCORES ON A BEHAVIORAL DIMENSION WHEN 
THERE IS NO INTERACTION BETWEEN TEACHERS AND VISITS) 


Mean Squares 
Source of 


Variation df. Observed Expected 


Teachers N-1 St2 + 


Visits m-1 Sy2 o? + Nnoy2 


Error N(mn - 1)-(m - 1) Se2 o? 


Total Nmn - 1 


26 
> 


MEDLEY - MITZEL 


TABLE 


COMPLETE DFSIGN FOR ANALYZING THE SCORES OBTAINED ON ACORNELL 
SCALE IN THE SERIES OF THREE TRYOUTS 


Mean Squares 
Source of 


Variation . Observed Expected 


Teachers st? o? + 2oty2 + 6ot¢2 


Visits o? + 2oty2 + 220y2 


Visit Error o? + 2otye2 


Total 


TABLE IV 
RELIABILITY ANALYSIS OF DIFFERENTIATION SCORES 


Source of 
Variation “ Sum of Squares Mean Square 


Teachers 1828. 3132 57. 1348 

Visits 149. 9696 24. 9949 

Visit Error 1044. 6970 17. 4116 

Observer Error 664. 5000 6. 7121 
Total 3687. 4798 


27 
197 


28 


As an example, the complete analysis of var- 
iance for the Differentiation scale, based on the 
three tryouts, is shown in Table IV. 

The test of H, (o¢y? = 0) yielded an F ratio 
of 2.59, which is beyond the .01 point of the ta- 
bled distribution; it was, therefore, concluded 
that cty? is greater than zero, and that varia- 
tions in teacher performance from day to day 
are a iactor contributing to the error in Differ- 
entiation scores, 

The test of Hg (o¢? = 0) yielded an F ratio of 
3.28, which is also beyond the .01 point. It was, 
therefore, concluded that oy? is greater than 
zero, and that the Differentiation scale discrim- 
inates teachers with a reliability greater than 
zero. 

When estimated according to equation (7), the 
components of the variance of a score were tak- 
en to be as follows: 


The estimated reliability of a Differentiation 
score based on a single record for one 25-rmin- 
ute visit is: 


r = (6. 62)/(6. 62 + 5.35 + 6.71) = .35 


and the estimated coefficient of observer agree- | 
ment is: 


r' = (11. 97)/(18. 68) = .64 


The r of .35 indicates that 35 percent of the var- 
iance of the scale is due to differences among 
teachers; 65 percent must, then, be due to er- 
rors of measurement. From the estimatedcom- 
ponents we calculate that 29 percent of the vari- 
ance is due to visit-to-visit variations in teach- 
er behavior, and 36 percent to discrepancies be- 
tween different observers’ records of the same 
behavior. 

A similar analysis was carried out for each 
of eight scales employed in Cornell’s technique. 
The results are summarized in Table V which 
gives the estimates of variance components and 
the coefficients of reliability and observer agree- 
ment. 

Three of the scales did not detect differences 
among the teachers in this study—Pupil Climate, 
Pupil Initiative, and Content. In the instances 
of Pupil Initiative and Content there is evidence 
that observers were able to agree on scores 
based on a single performance tothe extent 
necessary to achieve correlations of .43 and .23 


JOURNAL OF EXPERIMENTAL EDUCATION 


*When these curves were plotted, the estimate of each component was computed according to equation 
(7), whether or not the component could be shown to be different from zero. 


respectively; but there was so much variation 
from one performance to another that no stable 
difference among teachers was detected. The 
Pupil Climate dimension was not observable by 
the six observers employed—no two of them could 
agree about the score to be assigned a given per- 
formance observed by both. 

There are two scales—Variety and Teacher 
Climate—for which no error due to instability of 
teacher performance was detected. This sug- 
gests that the performance of teachers in these 
respects is relatively stable. 

The reliabilities of the best five scales were 
all of the same order of magnitude, ranging from 
.32 to .42. None of these values seems to be large 
enough to be used for estimating the typical score 
of an individual teacher. However, itis apparent 
from equation (5), which may be written: 


r = of? /|(o4? + oty?) /m + / mn] 


that by increasing either n (the number of observ- 
ers on a team—that is, the number in the class- 
room at one time) or m (the number of visits made 
to the classroom) the reliability can be increased. 
If m is allowed to increase without limit while n 
remains the same, Rmn approaches a value smal- 
ler than one, because increasing n reduces observ- 
er errors only, while increasing m reduces both 
types of error. The question of an optimal way 
of distributing a fixed number of observer-hours 
—that is, how large to make n when the number 
of observer hours mn is fixed—may be answered 
on the basis of the graph shown in Figure 1. 

‘ Figure 1 shows the reliability of five Cornell 
scales as a function of n, the number of observ- 
ers visiting a teacher at the same time, whenmn 
(the number of observations) is equal to twelve. * 
Thus, if one observer visits one teacher atatime, 
twelve visits must be made, each observer visit- 
ing the teacher on a different day; if two observ- 
ers visit the teacher at one time, six visits must 
be made—and so on, up to the case n= 12, in 
which all twelve observers must visit the teacher 
at one time. 

The curves in Figure 1 show unmistakably 
that the reliability of each of the scales falls off 
as team size increases. For a given cost per ob- 
server-hour, there seems little doubt that the 
greatest precision per dollar spent is obtained by 
sending observers into classrooms one by one. 

It might be remarked in passing that if the 
same observer visits a teacher twelve times, the 
reliability is substantially higher than when 
twelve different observers visit the teacher once 
each, because in the former case 0? = 0, and 
hence the quotient in equation (5) is greater. How- 


o2 (=) 6.71 | | 
oty? (=) 5.35 


MEDLEY - MITZEL 


TABLE V 
SUMMARY OF RESULTS OF RELIABILITY ANALYSES OF EIGHT CORNELL SCALES 


Components of Variance Coefficient 


True Visit Observer Reliability of Observer 
Scale Score Error Error Coefficient Agreement 


Activity 20. 49 10. 53 18. 63 -41 - 63 
Variety 1.51 0. 00 2.13 - 42 - 42 


Pupil Climate 0. 00 0.00 : 1.85 - 00 . 00 


Teacher Climate 2.75 0. 00 5. 78 . 32 
Social Organization 3. 82 2. 96 3. 52 . 37 
Differentiation 6. 62 5. 35 6. 71 
Pupil Initiative 0.00 7.26 6. 43 
Content 0.00 10. 34 34.71 


TABLE VI 


ANALYSIS OF VARIANCE OF SCORES FOR WITHALL’S CATEGORY 1: 
LE ARNER-SUPPORTIVE STATEMENTS 


Mean Squares 
Variation Observed Expected 


Teachers 81.015 o* + 2otye + 16042 
Visits 98. 623 o? + 2oty2 + Boye 


Visit Error 24. 468 o? + 2oty2 


Observer Error 1. 703 o? 
Total 


29 
.32 
. 66 
. 64 
43 
.23 


JOURNAL OF EXPERIMENTAL EDUCATION 


2 
$ 
4 


DIFFERENTIATION 
SOC ORE. 
VARIETY 
ACTIVITY 

CLIMATE 


TEAM SIZE 


FIGURE 1 
The Reliability of Certain Cornell Scales 


as aFunction of Team Size when the 
Total Number of Visits is Twelve 


30 
400 i 
95 
85 
BO 
- 
5 
0 
55 
50 
AS 
40 
30 
25 
20 
40 
95 
20 
t 4 6 8 © 


MEDLEY - MITZEL 31 


ever, such reliability is probably gained at the 
expense of validity, because observer biases 
are not cancelled out, but remain to distort 
teacher differences. 

Figure 2 shows the reliability to be expected 
for any number of visits (by a single obser ver) 
up to twenty. The rate of increase varies for 
different scales, but all of them level off. Ifa 
reliability of .90 is required for a particular 
purpose, sixteen visits would have to be made 
if Differentiation, Social Organization, Variety, 
and Activity are to be scored. Twenty visits 
will bring Teacher Climate scores upto this lev- 
el also. 


For these data, H, (o¢y* = 0) was rejected and 
Hg (o¢? = 0) remained in doubt, since the F ratio 
of 3.31 falls between the .01 and .05 points of the 
F distribution. The components of variance at- 
tributable to observer error, visit error, and dif- 
ferences among teachers were estimated to be 
1.703, 11.383, and 4.552 respectively. 

Similar analyses were made of the scores for 
categories 3, 4, 5, 6, and the Climate Index de- 
fined above. The results are presented in Table 
VII in the form of estimates of variance com pon- 
ents and reliability coefficients. As before, com- 
ponents not found to differ significantly from zero 
are reported equal to zero; and when o¢? does not 
differ from zero, the corresponding reliability 
coefficient is reported as zero. No analysis of 
categories 2 and 7 was made since remarks clas- 
sified in these categories were so rare that an an- 
alysis of them did not promise to be fruitful. 

Two categories failed to show reliability great- 
er than zero—Neutral and Reproving; and one— 
Learner-supportive—remaineu in doubt. The low 
coefficient of observer agreement reported for 
Neutral statements indicates that the definition of 
this category may be unclear; that for Rep roving 
statements is high enough to suggest that the fail- 
ure of the scores to discriminate teachers is prin- 
Cipally due to instability of this aspect of teacher 
behavior. This instability is also reflected inthe 
relatively larger component of variance due to vis- 
it errors. The same conclusion is indicated re- 
garding Learner-supportive statements. The re- 
maining three categories have reliability c oe ffi- 
cients around . 50. 

Curves like these in Figures 1 and 2 can be 
plotted for these data. When plotted, the curves 
indicated that all three of the ‘‘reliable’’ scales 
—Problem-structuring, Directive, and Climate 
Index—should reach a reliability of . 90 when they 
are based on ten visits. 


Apprication of the Design to the Tryouts 
of Withall’s Technique 


The second of the two techniques emp] oyed 
was based on Withall’s categories of verbal be- 
havior (15). The procedure used has been de- 
scribed elsewhere (8). The method consists in 
classifying the statements made by a teacher in- 
to seven mutually exclusive categories: 1. Learn- 
er-supportive; 2. Acceptant or clarifying; 
3. Problem-structuring; 4. Neutral; 5. Direc- 
tive; 6. Reproving, disapproving or disparaging; 
and 7. Teacher-supportive. The first three cat- 
egories were combined and the category obtained 
was called ‘‘Learner-centered’’ statements. 
The ratio of the sum of these three statement 
categories to the total number of statements 
made by a teacher is called the ‘‘Climate Index.’’ 

The tryouts with the Withall technique em- 
ployed two observers working as a team with 
four teachers in a single elementary school. 
Each teacher was visited by the team of two ob- 
servers on four occasions about a week apart. 
The observers remained in the classroom dur- 
ing each visit until approximately 100 statements 
had been classified. After comparing notes and 
clarifying the definitions of the categories, the 
same two observers visited each of the four 
teachers four more times at one-week intervals. 
Thus, there were available, finally, a total of 64 
counts in each category—corresponding to eight 


Discussion 


The results of the analyses presented above 
illustrate some of the practical advantages of the 
analyses of variance over correlation analysis in 
visits by two observers to four teachers; in the estimating the reliability of observational data 
notation of this report, m = 8, n= 2, N= 4. when more than two scores per person are avail- 

The count for each category for one period able. First, the analysis of variance yields a 
was divided by the total number of remarks tal- single estimate of the reliability coefficient which 
lied in that period. The proportion so obtained uses all of the information contained in the data; 
was then transformed to an angle measured in second, the analysis of variance makes it pos si- 


degrees by the use of the arc sine tr ansforma- 
tion (12:449-50). Such scores have the advan- 
tage of having standard errors of measurement 
which are independent of the magnitude of the 
score. 

The design used for analyzing each category 
of response was as shown in Table VI, which 
also gives the results for Category 1: Learner- 
supportive statements. 


ble to partition the error into components attrib- 
utable to different sources; and, finally, itis pos- 
sible (if the necessary assumptions are fulfilled) 

to test the significance not only of the coefficient 

obtained, but also of each component of error. 

In the study of the Cornell technique, for ex- 
ample, no fewer than 36 correlation coefficients 
estimating the reliability of the technique could 
be obtained, all equally accurate. Each such es- 


JOURNAL OF EXPERIMENTAL EDUCATION 


o 
< 
3 


DIFFERENTIATION 
VARIETY 
ACTIVITY 


Visits 


FIGURE 2 
The Reliability of Certain Cornell Scales 


as a Function of Number of Visits 


32 
1.09 
| 
— 
8S 
4" 
0 
85 / 
60 ji / 
5 
fil 
et 
.20 
i 
vs 5 10 20 


= 
' 
a 
: 


96° 
88° 
L6° 
86° 
06° 


00° 
00° 
*SZ° 


TT8 “02 
“POT 
b26 “8ST 
9LE “TOT 
LT 


OLS “bb 
LLE“6 

690 
“TI 


Suraoiday 


Iaarasqo jo 


Te}OL 


10119 
VSIA 


JO syuauoduiog 


TIVHLIM XIS dO SASATIVNV ALITIGVITSY AO SLINSAY AO AUVWWNS 


TIA ATAVL 


ov 
88:28 8 
g sas g | 


34 


timate would be a product-moment coefficient 
based on 11 cases, so that the accuracy of any 
one would be low. Moreover, each estimate 
would have a considerable bias (4:205). A mean 
of the 36 coefficients could be used, but the bias 
would remain (or perhaps increase, since the es- 
timates would not be independent); and noth ing 
is known about the sampling error in such a 
mean. The estimate obtained by analysis of var- 
_ iance, however, is unbiased (4:225), unique, and 
of known precision. i) 

In each of the examples given we have parti- 
tioned our error variance into two components, 
and shown how such a partition of errors can be 
used in planning further uses of the observation- 
al technique by indicating where some of the er- 
rors originate, and can yield estimates of differ- 
ent correlations. A different design could separ- 
ate errors due to differences in behaviors ob- 
served on different days from differences in be- 
haviors observed on the same day; errors due to 
differences between observers from differences 
in what a given observer sees in different five- 
minute periods during the same visit, etc. A 
‘‘reliability’’ coefficient corresponding to each 
type of error could be estimated, the relative 
importance of each source of error could be as- 
sessed, and plans for future observations could 
then be made more intelligently. 

When an instrument of low reliability is tried 
out on a small scale, as is usually the case when 
the instrument requires a rather large expendi- 
ture of a trained observer’s time before even 
one measurement is obtained, itis essential that 
it be possible to test the hypothesis that the true 
reliability of the scale is zero, as well as to es- 
timate its magnitude, since sometimes a rel ia- 
bility large enough to appear useful may be non- 
Significant. Such tests are easily made as part 
of the analysis of variance. It is also possible 
to test whether a certain suspected source of er- 
ror (as, for example, observers’ fallibility) is 
in fact making a significant contribution. 

These advantages of analysis of variance 
clearly indicate the unsuitability of the cor re- 
lational method when the data available include 
more than two independent measures of each in- 
dividual. The only situation in which the latter 
method might be useful is that in which a set of 
N pairs of scores on equivalent forms of atest 
are available. Indeed, the reliability coefficient 
is often defined as the correlation between 
scores on equivalent tests in the population of in- 
dividuals. It is natural to assume that the corre- 
lation in the sample is the appropriate estimate 
of the correlation in the population. But when 
the correlation in question is a reliability coeffi- 
cient this is not true. 

If we are correlating a test x and another 
measure y, the population correlation may be 
written as follows: 


"JOURNAL OF EXPERIMENTAL EDUCATION 


Rxy = Oxy / Ox Oy» 


where 0x and oy are the standard deviations of the 
two measures and Oxy is their covariance. The 
appropriate estimate of Rxy from a sample of N 
pairs of scores n and y is the product-moment or 


interclass correlation, which may be written: 


_Txy = Sxy / Sx Sy, (9) 


where Sxy, Sx and Sy estimate o: Ox, and oy 
But if we are correlating a test x and an ouuiv- 
alent test x', the population correlation is 


Rxx' = Oxx' / 0x’, 


where 0x” is the variance of either test, and oxx' 
the covariance of the two. The product-moment 
correlation coefficient used to estimate this would 
be 


= Sxxt / Sk Se, 


where Sx? and Sx * are estimates of ox? from 
each of the two tests, and Syx is the estimated 
covariance. It is clear that we are using the geo- 
metric mean of two sample estimates to estimate ~ 
the population variance, ox?. 

If we analyze the total variance of the N pairs 
of scores x and x' into two components, one from 
comparisons between individuals, with N--1 de- 
grees of freedom, and onefrom comparisons with- 
in individuals or ‘‘error’’, with N degrees of free- 
dom, we can obtain the intraclass correlation r 
as follows: 


(mean square between) - 
(mean square between) + 


(mean square for error) 
(mean square for error) 


(10) 


This is an estimate of Rxxt which may be 
shown to be related to the interclass correlation 
rxx' as follows: 


r = (2Sx rxx' - K) / (28x? +2Sy17+K), (11) 


where rxx', Sx, and Sx' are as defined above, 


x and xX' are the test means, and 


2NK = N(x - x')* + 2S, Syr ryyr - Sy? - Syr? 


It can be shown that K is never negative, and 
that when K is greater than or equal to zero, ris 
smaller than rxx'. We may, therefore, say that 
the estimate rxx' is always greater than the esti- 
mate r. 

Fisher (4:205, 211 ff.) points out that ryx' sys- 
tematically overestimates Rxx', but that r does 
not, and that the latter estimate is more precise 


MEDLEY - MITZEL 


than the former. The bias in ryy is small ex- 
cept when r is small, and the difference in pre- 
cision is slight. 

The procedure usually followed in actual 
practice is to analyze the variance of the 2N 
scores into three components rather than two; 
one for differences between individuals, with 
N - 1 degrees of freedom, one for differences 
between test means with one degree of freedom, 
and one for residual or error, with N-1 de- 
grees of freedom. The estimate r of reliability, 
as defined in formula (10) above, may then be 
written: 


r = 2SySxyr ryxt / (Sx? + Sx 2) (12) 


Comparing this with formula (9) we see that r 
differs from rxx' in that it uses the arithmetic 
mean of the two sample estimates of ox? instead 
of the geometric mean. 

The internal consistency reliability of atest 
may be estimated from an analysis of variance 
of N pairs of half-test scores in either of the two 
designs described above from the formula: 


_ (mean square between) -(mean square for error 
(Mean square between) 


Summary 


A procedure for estimating the reliability of 
scores based on observations of behaviors was 
described, and its use illustrated in two some- 
what different situations. A discussion of the 
relative merits of analysis of variance and cor- 
relational analysis as techniques for estimating 
reliability coefficients led to the conclusion that 
the former has three distinct advantages over 
the latter. It yields a single best estimate of re- 
liability; it supplies independent measures of the 
amount of error from different sources, and it 
provides for simple, exact tests of significance. 
When only two sets of measurements are avail- 
able, an estimate of reliability may be obtained 
by correlational analysis, but it is biased and 
has a larger sampling error than that obtained 
by analysis of variance. When more than two 
sets of measurements are available, no satis - 
factory estimate can be obtained by correlation- 
al analysis. We, therefore, suggest that the 
use of the correlational technique be limited to 
validity estimation, and that the analysis of var- 
iance be adopted as the standard procedure for 
estimating reliability. 


RE FERENCES 


. Alexander, H. W. ‘‘The Estimation of Reli- 
ability When Several Trials Are Available,’ 
Psychometrika, XII (1947), pp. 79-99. 

. Brown, William. ‘‘Some Experimental Re- 
sults in the Correlation of Mental Abilities,’’ 
British Journal of Psychology, II (191 0), 
pp. 296-322. 

. Cornell, F. G., and others. An Exploratory 
Measurement of Individualities of Schools 
and Classrooms (Urbana, Ill.: Bureau of Ed- 
ucational Research, College of Education, 
University of Illinois, 1952). 

. Fisher, R. A. Statistical Methods for Re- 
search Workers (London: Oliver and Boyd, 

» 


. Hoyt, Cyril. ‘‘Test Reliability Estimated by 
Analysis of Variance, ’’ Psychometrika, VI 
(1941), pp. 153-60. 

. Jackson, R.W.B. ‘‘Reliability of Mental 
Tests, ’’ British Journal of Psychology, 
XXIX (1934), pp. 267-87. 

. Jackson, R.W.B., and Ferguson, G. A. Stud- 
ies on the Reliability of Tests, Bulletin 12 
(Toronto: Department of Educational Re- 
search, University of Toronto, 1941), p. 132. 


. Lindquist, E. F. Design and Analysis of E x- 


periments in Psychology and Education 
(Boston: Houghton- Mittin Co., 1953). 


- Medley, D. M., and Mitzel, H. E. Studies 


of Teacher Behavior: The Refinement of 

Two Techniques for Observing Teachers’ 
Classroom Behaviors, Publication No. 28, 
(New York: Office of Research and E valu- 
ation, Board of Higher Education of the City 
of New York, 1955). 

. Mitzel, H. E. and Rabinowitz, W. ‘‘Assess- 
ing Social-Emotional Climate in the Class- 
room by Withall’s Technique, ’’ Psycholog- 
ical Monographs, LXV (1953), 

. Pilliner, A.E.G. ‘‘The Application of Anal- 
ysis of Variance to Problems of Correla- 
tion,’’ British Journal of Psychology: Sta- 
tistical Section, V (1952), pp. 31-38. 

. Snedecor, Statistical Methods (Ames, 
Iowa: Iowa State College Press, 1946), p 
485. 

. Spearman, Charles. ‘‘Correlation Calc ul at- 
ed with Faulty Data,’’ British Journal of 
Psychology, III (1910), pp. 271-95. 

. Wwandt. E., and Mitzel, H.E. Studies of 
Teacher. Behavior: Plan for a Program of 
Research, Publication No. 21 (New York: 
Office of Research and Evaluation, College 
of the City of New York, 1954). 

. Withall, J. G. ‘‘The Development of a Tech- 
nique for the Measurement of Social-Emo- 
tional Climate in the Classroom, ’’ Journal 
of Experimental Education, XVII (1949), pp. 
347-61. 


35 
1 
2 
10 
3 
11 
4 
12 
5 
1 
6 
1 
7 
1 


JOURNAL OF EXPERIMENTAL EDUCATION 
(Volume 27, September 1958) 


A DESCRIPTIVE ANALYSIS OF A DEPART- 
MENTAL CURRICULUM IMPROVEMENT 
PROJECT IN AN URBAN JUNIOR 
HIGH SCHOOL 


WILLIAM M. RASSCHAERT* 
Detroit Public Schools 


SECTION I 


Statement of the Problem 


IN A LARGE city school system employing 
over nine-thousand teachers it is readily conceiv- 
able that there will be many differences among 
teachers in terms of their methods, philosophy, 
goals, behavior, and their attitudes toward chil- 
dren and adults. Even ina single school, wide 
differences are not hardtofind, yet most schools 
seem to operate in a fairly smooth, efficient, 
and productive manner. Looking still more 
closely, those of us who are on the ‘‘inside’’ of 
the educational scene have seen, in all probabil- 
ity, evidence of differences of varying degrees 
within even the smallest formal school-unit, the 
department. It is this unit, consideredas afunc- 
tioning group, about which our study was con- 
cerned. Specifically, the problem was to ana- 
ly ze and describe what happened when a teacher- 
administrator group initiated, organized, con- 
ducted, and evaluated acurriculum improvement 
project in one department of an urban junior high 
school over a forty-week period. 


Major Hypotheses, Related Questions 


The plan and procedures of this study were 
aimed at finding answers to the following ques- 
tions: 


1. What changes, if any, occur in the teach- 
ers’ perceptions of their own responsibilities in 
teaching when they become involved in coopera- 
tive curriculum improvement? 

2. What will be the outcomes of a small devel- 
opmental study which seeks to stimulate by large- 
ly non-directive means the improvement of in- 
struction in a single department within one ur- 
ban junior high school? 

3. What conditions or factors appeared to be 


influential in. tending to make the teachers produc- 
tive and creative in the group? 


Five Major Hypotheses 


The hypotheses tested in this study are stated 
below. 


1. There will be tangible modifications in 
classroom practices of the teachers in- 
volved in the study. 

2. A variety of instructional methods will be 
tried and tested. 

3. There will be an increase ‘in the confidence 
of teachers in defining problems. 

4. Teachers will feel increasingly secure in 
exchanging suggestions with each other. 

5. The Head of the English Department and the 
Principal will allow and encourage teachers 
to try, test, and develop newer methods, 
techniques, and courses. 


Related to the testing of these hypotheses, 
sonmie data on the questions below were looked for 
and examined. 


1. What are the strong points of the action re- 
search method and the work-group-confer- 
ence technique? 

2. Under what conditions can these methods 
best be used in creating curriculum change? 

3. What are the limitations of the methods and 
their application? 


Two Assumptions Underlying the Study 


Two basic assumptions were made by the writ- 
er at the outset of the study. 


AssumptionOne: The curricular experiences 
of pupils are determined in large measure 
by the values, goals, skills, and attitudes 


*An abstract of an unpublished Ed.D. dissertation, Wayne State University, 1957. 


38 JOURNAL OF EXPERIMENTAL EDUCATION 


held by teachers. 

Assumption Two: In order to change the cur- 

’ riculum it follows that there must be un- 
dertaken an attempt to change the values, 
goals, skills, and attitudes of the people 
involved in respect to education (2), but 
more specifically in respect to inte rper- 
sonal relations among members of a work- 
ing group. 


The ideas expressed in these assumptions were 
accepted and shared, at the beginning of the 
study, by the Supervisor of Language Education, 
the principal of the participating school, and the 
writer. In addition, the Director of Language 
Education for the city school system also en- 
dorsed these assumptions when he actively 
launched the project. 


Background and Significance of Curriculum 
Development 


One of the greatest chronic problems in edu- 
cation in the United States during the past thirty- 
five years has been the apparent lack of utiliza- 
tion of research findings by teachers in the na- 
tion’s classrooms. Tremendous quantities of 
research resultsfill library shelves and, al- 
though a great part of these research findings 
could be of inestimable value, they have been 
barely tapped. There are many reasons for this 
rejection or ignoring of research on the part of 
teachers, but, reasons or not, unless this pat- 
tern is changed the youth of our country will con- 
tinue to pay the price in the form of less full, 
beneficial, crucially-needed education. 

The background of one-hundred years of cur- 
riculum development is briefly and concisely 
presented in an NEA bulletin, 100 Years of Cur- 
riculum Improvement, 1857-1957 (1). Prepared 
by the Association for Supervision and Curricu- 
lum Development, the bulletin traces major 
changes in the concept of learning, teaching and 
.curriculum improvement. Briefly, some of 
these are: 


1. The change from the faculty psychology of 
learning to an organismic, dynamic psy- 
chology... with emphasis on meaning, goal- 
seeking and integration in the learning pro- 
cess. 

2. Change from reliance on traditionand sub- 
jective judgment...to concern for scientif- 
ic research and the application of scientif- 
ic methods and findings. 

3. Changes in methods and materials (used 
in teaching) have grown out of the idea that 
how we learn is as important as what we 
Tearn. 

4. Changes in our total approach to children 
in learning situations have been influenced 


by the finuings in the field of Child Growth 
and Development. 
5. Changes in patterns of participation in cur- 

riculum building: 

a) Fixed body of subject matter set by ‘‘ex- 
perts’’ to 

b) Shared participation—teachers, pupils, 
lay people led by 

c) Administrators, supervisors, and re- 
source persons. 


Action Research 


Action research, based on the ‘‘field theory’’ 
psychology of Kurt Lewin (5), is the newe'st re- 
search approach to educational problems because 
it has within it the potential to apply experimental 
social psychology to ‘‘natural’’ social groups. 
Good action research employs mathematical 
means of measurement and testing, statistical an- 
alysis and other tools of fundamental research. 


Teacher Participation 


During the past one hundred years the changes 
in ‘‘Who shall build the curriculum?’’ have been 
most marked. The current view that teachers 
should be given the opportunity to participate in 
curriculum improvement is perhaps the biggest 
step forward in the direction of actually bringing 
results of research into the classroom. Admin- 
istrators, teachers and supervisors have an op- 
portunity now, as never before, to work together 
cooperatively to improve all phases of education. 


Factors in Cooperative Effort 


To create actual improvement inthe class- 
room it is imperative that teachers understand, 
appreciate, develop and apply research. In order 
for them to do this they must be given opportun- 
ity to learn, change and improve. Adm inistra- 
tors and supervisors must afford this kind of op- 
portunity. They must structure a framework 
which is conducive to good personal relationships, 
gives a chance for free expression and deve lop- 
mental acquisition of research skills. Finally, 
the administrators and supervisors must be will- 


ing to support cooperatively created changes in 
the total curriculum. 


The Work-Group-Conference Method 


Meier, Cleary and Davis (6) drew ona num- 
ber of fields to create a technique of cooperative 
action which they labeled the ‘‘work-group-c on- 
ference method’’. This method is one of the new- 
er tools with which can be realized, functionally, 
a good human relations, action research ap- 
proach. It seems to have within it the potentials 
to release the creative and productive talents of 


RASSCHAERT 39 


people working and acting in harmony. It seems 
to be a technique by which supervisors, consult- 
ants, and staff generally as well as.princ ipals 
and other administrators can achieve successful 
improvement of their own and others’ behavior 
and, consequently, clearer thinking and sharper 
action to reduce problems inherent in the educa- 
tional scene. 


A Means of Problem Attack, Action, and 
Developing Research Abilities 


The work-group-conference method lends it- 
self ideally to problem solving because it has a 
social-psychological basis which encompas ses 
the total aspects of the individual, the group, 
and the environment in which these operate at a 
given time. A supervisor, when facing instruc- 
tional problems, can employ the technique of 
work-group-conference method in an action re- 
search frame and lead in helping teachers to 
solve the problems in a cooperative and scien- 
tific manner. 

Stephen M. Corey lists the following as ‘‘sig- 
nificant elements of a design for action re- 
search’’ (4): 


1. The identification of a problem area about 
which an individual or a group is sufficient- 
ly concerned to want to take some action. 

2. The selection of a specific problem andthe 
formulation of a hypothesis or prediction 
that implies a goal and a procedure for 
reaching it. This specific goal must be 
viewed in relation to the total situation. 

3. The careful recording of actions taken and 
the accumulation of evidence to determine 
the degree to which the goal has been 
achieved. 

4. The inference from this evidence of gener- 
alizations regarding the relation bet ween 
the actions and the desired goal. 

5. The continuous retesting of these generali- 
zations in action situations. 


Bases of Action Research (Summary) 


1. It is based on the social dynamics theories 
of Kurt Lewin. 

2. The psychological basis of the social dyn- 
amics theory is grounded on what is gener- 
ally termed the ‘‘field’’ theory-type of psy- 
chological action. 

3. Action research is usually carried out ina 
field setting in contrast to a laboratory set- 
ting. 

4. It is an extension of basic social research 
and includes in its methods the utilization 
of mathematical and conceptual problems 

. of theoretical analysis. : 

5. This research lends itself to immediate ap- 


plication in on-going developmental situa- 
tions. 

6. Although it is not an inherent characteristic 
of action research that it always be a coop- 
erative enterprise, the application of find- 
ings is usually more effective if the investi- 
gator works in close collaboration with the 
persons of the agency or institution being 
studied. 


Philosophy of Cooperative, Developmental 
Improvement 


One of the greatest deterrents to research on 
the part of teachers (and also on the part of super- 
visors who want to involve teachers in research) 
is the fear, apparently, that the research will not 
conform to ‘‘high standards’’. Also, on the part 
of teachers, research in the traditional sense 
seems too far removed to be of fairly immediate 
help with problems of highly immediate import- 
ance. 

The work-group-conference method encour - 
ages developmental growthin teachers’ research 
abilities. Within a tyoical teacher-administrator 
or other adult group several levels of sophistica- 
tion in research ability will usually befound. As- 
suming that the group is well led and that con- 
ditions necessary for its successful operation are 
present, the members are likely to become se- 
cure and reasonably confident in attempting to set 
up a design and try objective problem solving. 


The fact that attempts at problem solving 
fall at various points on a continuum rang- 
ing from careless, untested inquiry to 
careful and reliable research is rarely em- 
phasized. This is regrettable, because, 
although teachers and other people value 
research in the abstract, they feel that it 
has little relation to the methods they must 
employ to solve their own problems. There 
is little motivation for practical problems 
to move in the direction of better and bet- 
ter research methods. They are learned 
with practice. To refrain from trying be- 
cause one lacks skill or has perfectionist 
aspirations precludes improvement, and 
improvement is what counts. (4) 


It is against this background of supervision and 
curriculum development theory and current con- 
cepts of research in this field that the problem of 
this study took form. 


SEC TION 0 
Structure and Development of the Study 


THE GENERAL plan of the study included: 
1) inviting teachers to participate voluntarily in 
a two-semester project; 2) enlisting the support 
q 


40 JOURNAL OF EXPERIMENTAL EDUCATION 


/ 
of the principal, department head, and special- 
ist supervisor; 3) providing for biweekly meet- 
ings over two full semesters; 4) providing for 
guest speakers, special materials, films, visits 
to other schools, etc.; 5) the author of this re- 
port serving as coordinator and organizer of 
group activities, encouraging members of the 
group to undertake experimentation in their own 
classes and to report back to the group; 6) meas- 
uring teachers’ perception of roles of group’s 
status people. 

The report of the study described four phases 
of our group’s development, phases similar to 
those that are traced by Thelan and Dickerman 
in ‘‘Stereotypes and the Growth of Groups’’ (7). 
Because we followed the pattern of tracing major 
phases of the group’s growth, it is important to 
list briefly questions related to each phase (3). 


. What happened to the project of our group? 

. What happened to our group and the indi- 
viduals in it? 

3. What blocked the work of our group? 

4. What facilitated the work of our group? 


A final item under each phase was: 


5. Summation of evidence and interpretations 
of each phase. 


Methodology: Procedure and Sources of Data 


This project was an action research, cooper- 
ative type of study. All the participants worked 
on one major problem: the improvement of in- 
struction in English at this one junior high school 
The major features of the methodology include 
the following: 


1. Each teacher had the freedom to work on 
a specific problem in the English (Language 
Arts) area. 

2. Teachers were encouraged to work in a 
manner of their own choice: a) cooperatively on 
one problem; b) individually on separate prob- 
lems within the scope of the English curriculum; 
or c) in freely formed subgroups on one or sev- 
eral problems. 

3. The teachers’ populations for study were 
their own pupils from one or more of their own 
classes and/or the available data on all pupils 
(cumulative records, reading test scores, fam- 
ily background information, etc.). 

4. The writer’s population for the study was 
not the pupils but all the twelve members of the 
study group. 


Part of the methodology includes definition 


* All footnotes will be found at end of article. 


of the roles of various people in the study: 1) the 
principal, 2) the department head, 3) the super- 
visor, 4) the coordinator, and 5) the roles of 
the nine participating teachers. 

The work-group-conference method was in- 
trinsic to the broad action research methodology 
of the whole project. The total group met about 
every two weeks (total of eighteen meetings in two 
semesters) after school for planning, discussion, 
reporting, and evaluation. The average length 
of each meeting was two hours and fifteen min- 
utes. The group focused its attention on ‘‘c on- 
tent’’, viz., various aspects of English: reading, 
grammar, composition, testing, spelling, hand- 
writing and other things. The writer was con- 
cerned with interaction, the dynamics of the group 
situation and any interaction between meetings as 
well as with the English curriculum, the teaching 
of reading, spelling, writing, listening, grammar, 
etc. 


Types and Sources of Data Useal * 


The following is a descriptive listing of the 
types and sources of data which were obtained 
over the entire forty-week period, starting in Sep- 
tember 1955 and ending in June 1956. Some addi- 
tional data were obtained in August and September 
of 1956, and this will be the last item on this list. 
Because the data were gathered in an on-going, 
evolving ‘‘situational frame’’, no attempt is made 
here to place the items chronologically in terms 
of ‘‘when’’ they were collected. 


1. Descriptions of Individual Research Pro - 
jects. Each teacher submitted a ‘“‘Progress Re- 
port’ on his research project during the thirty- 
second week of the Project. In the fortieth week, 
each teacher contributed a Final Written Report — 
on his project in which he described, analyzed, 
and interpreted data gathered in his experiment. 

a) These reports were consolidated into a 
Group Final Report and submitted to the 
Director of the Language Education De- 
partment. 

b) Each member of the Study Group re- 
ceived a hectographed copy of the Group 
Final Report. 

2. Oral Reports. Some members ofthe group 
gave oral reports on their projects during the 
course of the Study. A discussion period fol- 
lowed each of the reporting sessions. 

3. Evaluation of the Project. Each member of 
the committee (group) was asked to respond (in 
writing) to the following: ‘‘What data, ideas, opin- 
ions or impressions have you gained from (a) the 
particular project you selected, and (b) what ef- 
fect have the Study and the conferences had upon 


RASSCHAERT 41 


your approach to your teaching problems?’’ 

a) This was done partially in the Progress 
Reports mentioned in item 1 inthis list. 

b) More evidence, although again only par- 
tial, was gotten in a more controlled 
manner through administering an “Opin- 
ionaire”; this device was administered 
twice, the second time at the suggestion 
of the group. 

4. Statements of Purpose. During the fourth 
week of the project each member was asked to 
state, in writing, what he perceived his own 
purpose to be in participating in the Study. These 
‘*Statements of Purpose’’ were compared to 
‘*Self-Evaluation’’ statements completed at the 
end of the study. 

5. Records of Group Meetings. A factual 
record (minutes) of each meeting was kept. Each 
set of minutes was analyzed and interpreted by 
this writer. 

6. Records of Conversations and Consul ta- 
tions. Insofar as it was possible tobe accurate 
and objective, several relevant talks bet ween 
this writer and individual members of the group 
were described. 

a) We tried to develop here the ‘‘Key Peo- 
ple’’ concept and how it relates to friend- 
ship factors and informal communica- 
tion. 

b) Attempts were made here, also, to show 
(1) comparison of some members’ pri- 
vately expressed views on the project 
with those expressed at Group Meetings; 
(2) values of liaison between the coordi- 
nator and key members of the group 
outside of regular group meetings. 

7. Measures of Rise and Fall of Members’ In- 
terest and Attitudes Toward the Project. “End- 
of Meeting Evaluation Slips” were given to the 
group members periodically. Each'indi vidual 
filled out such a slip. 

a) A ‘‘Consolidation Sheet’? showing in 
minute detail all the responses made on 
the individual End-of-Meeting Slips was 
prepared by the coordinator and given 
to each group member. 

b) Both the individually completed slips 
and the Consolidation Sheet show rise 
and fall of interest as well as general 
attitude. 

c) The “‘Slips’’ gave us an index on each 
individual while the ‘Consolidation 
Sheet’’ showed a group (total) reaction. 

8. Data on Power Structure. Evidence was 
gathered indicating what the group members per- 
ceived to be the power structure of the total 
group. 

a) An ‘‘Opinionaire’’, a type of projective 
instrument, was prepared by this writer 
and administered to all members of the 
group. 


b) The Opinionaire sought to discover spe- 
cifically ‘‘what the teacher-members per- 
ceived the roles of the principal, depart- 
ment head, supervisor, and the coordi- 
nator to be in this study. ’’ 

c) The Opinionaire was given a second time 
to only the teacher-members at their own 
suggestion; the results of the first and sec- 
ond res, onses will be compared. 

9. Evidence of Attitudinal Changes in Each of 
the Status People in the Group. This evidence is 


gathered from all of the sources mentioned above 
but treated separately in order that discrete state- 
ments may be made about each of the three ‘‘status’’ 
people—the principal, the department head, and 
the supervisor. 

10. Self-Evaluation Statements. Some mem- 
bers of the group were invited to submit adescrip- 
tive statement in which they attempted to answer: 
‘*What did involvement in this project do for me 
personally?’’ The responses to this question will 
also be compared to the ‘‘Statement of Purpose’’ 
prepared at the beginning of the study. The Self- 
Evaluation statement was asked for during the 
summer following the termination of the study. 


Anticipated Outcomes of the Study 


1. It was felt that the study would give us var- 
ious kinds of evidence regarding the practicality 
and efficiency of using the work-group-conference 
method to accomplish curriculum change ina field 
situation over a relatively short period of time 
but under intensive application. 

2. Some anticipated, specific outcomes in the 
field of supervision were focused on questions 
such as the following: 

a) Can teachers’ values and attitudes rela- 
tive to education be effectively changed 
through involvement in cooperative group 
effort in curriculum improvement? 

b) Will the changed values and attitudes be 
reflected in the curriculum ? 

c) What are the elements in a group situa- 
tion that help people work together har- 
moniously? 

d) What conditions are conducive to stimula- 
ting people to be creative in a group pro- 
ject? 

e) What factors help or hinder communi c a- 
tion among people in a group? 

f) Are“xey” people needed to initiate, devel- 
op, and maintain a group as it ,asses 
through the various stages of develop- 
ment? If so, who are they? What ident- 
ifies them? 

3. In regard to working on a departmental lev- 
el, some evidence should be of value to depart- 
ment heads and others interested in working to- 
ward change through the departmental unit. 

4. For teachers and others interested in at- 


42 JOURNAL OF EXPERIMENTAL EDUCATION 


tempting further change of the curriculum inthe 
English or Language Arts area, this Study 
should give insights to the following: 

a) What aspects of the English curriculum 
do the teachers rate ‘‘most important’’? 

b) Are teachers’ differences more appar- 
ent than real in their view and practice 
of teaching English? 

c) Can ‘‘pilot groups’’ in one or more 
schools effectively influence curriculum 
patterns in other schools of the same 
system ? 

The study succeeded, we believe, in making 
concrete the vague ‘‘intangibles’’ that comprise 
what is known as a “‘group’’. Communication 
between people was one of the common factors 
examined directly and indirectly. It would be 
well for the reader to remember at all times the 
importance the writer gave to the communication 
factor throughout the study. 


Limitations of the Study 


Generalizations cannot be made from this 
study to any other population but that involvedin 
the project. This is a descriptive study, one 
which employs a case-study approach. The four- 
teen teachers and administrators involved as 
well as the pupils with whom the teachers worked 
are the limits of the population to which general- 
izations can be applied. 

Because it did utilize the case-study approach, 
however, the results can serve as an indication 
of what could be expected of the work-group- 
conference method, action research, etc., under 
reasonably similar circumstances. 

Another limitation arises from the fact that 
the coordinator-recorder and the teachers were 
obviously subject to error in recording, trans- 
posing, documenting and interpreting data. This 
was constantly and continuously checked, how- 
ever, and all minutes and other written data 
were examined by the teachers at every group 
meeting and approved by them. 


The Participants— Teachers 


There were originally eleven teachers in the 
project, all teachers of English but some with 
specialized jobs within the department. Two 
taught ‘‘general language’’ in additionto English. 
One taught journalism and acted as adviser to the 
school newspaper. A fourth member had a ‘‘ra- 
dio workshop’’, a complete broadcasting studio 
which held regular daily classes. This teacher 
was also the building audio-visual chairman. A 
fifth teacher was actually a member of the Social 
Studies Depart ment but taught remedial reading 
in the English Department. Another type of dis- 
tinction among them was the grade levels taught. 
One particular teacher handled seventh grade 


English classes only, while another taught exclu- 
sively ninth graders. 

The age range of the members (an all white 
group) was from 22 to over 60 years. Teaching 
experience varied from one to more than 40 years. 


Participants—Leadership and Administration 


There were four people involved in the study 
who by their formally designated positions had to 
assume leadership and administrative responsi- 
bility for the total project. These people were 
the Supervisor of Language Education, official 
representative from Division of Instruction, De- 
troit Public Schools; the writer who served as co- 
ordinator and group recorder; the school princi- 
pal who was actually ex-officio chairman of the 
group and through whom most of the group’s de- 
cisions affecting‘curriculum changes and class- 
room experiments had to be cleared; andthe head 
of the department in Urban School. The latter 
worked with the coordinator in forwarding com- 
munications to the group, making emergency ad- 
justments. of meeting schedules, and in making 
available time or materials needed by teachers 
carrying out their individual research projects. 

The Director of Language Education, though 
never deeply nor personally involved in the pro- 
ject, approved it formally. His assisant, the 
Supervisor of Language Education, was the real 
administrator-participant, however. It was she 
who helped plan, formulate, and direct the pro- 
ject and set at least part of its major goals. 

The coordinator’s responsibilities included 
initial planning with other group administrators, 
arranging for meetings, finding clerical assis- 
tance, keeping the group informed and directed 
and, finally, leading the group to reporting and 
evaluation. 


The Purposes of the Study Group as They 
Were Perceived by Administrators of the 
Project—A Restatement 


Initially the administrators perceived the 
goals of the group differently than did the teach- 
érs. Below is a specific restatement of purposes 
of the group as seen by the coordinator, supervis- 
or, principal, and department head. 


1. To stimulate teachers, by largely non-di- 
rective means to organize, conduct and 
evaluate a curriculum. research project 
within their own department in their own 
school. 

To develop in teachers abilities and in- 
sights related to: a) appreciation of util- 
izing research methods; b) applying, test- 
ing, and evaluating results of their own and 
others’ research in the classroom s; and 
c) values of cooperative effort to improve 


RASSCHAERT 


the curriculum of their own department; 
d) perception of themselves and others 
from the view of their own values, skills, 
and attitudes as these influence the curric- 
ular experiences of pupils. 
To develop leadership abilities of the teach- 
ers and to bring them closer to the reali- 
zation that leadership shifts ina group 
situation from one person to another. 

. To motivate teachers to creativity both in 
the group and in the classroom. 

. To emphasize that research ability is ade- 
velopmentally acquired skill and encour- 
age teachers to work at it. 


Purposes of the Group as Proposed for the 
Teachers by Administrators of the Group 


1. To improve the instructional program in 
English at Urban Junior High School. 

2. To try new methods, materials, and tech- 
niques in the classrooms. 

3. To understand better existing methods, 
practices, materials, and techniques. 

4. To contribute, ultimately, the results of 
the group’s work to anew Curriculum 
Guide in English for Junior High Schools. 


Procedure 


The project was organized around two focal 
points, in terms of its operation: 1) regular 
group meetings held after school at Urban Jun- 
ior High every second week for two consecutive 
semesters, and 2) the carrying out of instruc- 
tional change—experiments, tests, re-examin- 
ation of established methods and materials—by 
the teachers with their pupils in the regular 
classroom situations. The biweekly. meetings 
were aimed at planning, discussion, and evalua- 
tion leading always to application or modifica- 
tion of the classroom research being done by the 
teachers. 

Early in the project it was decided by the 
group that the coordinator would be doing a mu- 
tual service for himself and the group by acting 
as recorder. The notes or minutes of each 
meeting, as well as other material needed by. 
the group, were then hectographedat the home 
of one of the group members. This person hec- 
tographed almost all the materials which grew 
out of the group project. 

The coordinator was present at every meet- 
ing (eighteen in all) of the group. The principal, 
department head, and the supervisor were not 
always present, and when they were did not al- 
ways stay for the entire meeting. This was es- 
pecially true of the principal who felt, for atime, 
that his voluntary withdrawal fromsome meet- 
ings might ‘‘free’’ the teachers from a restraint 
often existent when a status person is in the group. 


43 


Periodically, end-of-meeting evaluation slips 
were filled out by teachers and other types of 
data collected from them. Whenever such data 
were requested it was made explicit that the data 
would be used both for feedback to the group and 
for the writer’s dissertation. 


SECTION 


Interpretation of Data2 


THE FOCUS of the data-gathering instruments 
was on tracing, (1) developmental growth of the 
participants in skills and insights relatedto inter- 
personal relations, (2) research abilities and ap- 
preciations, (3) changes in self-perception 
(4) awareness and understanding of the group’s 
power structure, (5) evaluation skills, (6) com- 
munication patterns, and (7) appraisal of methods 
and materials in the English curriculum of Urban 
Junior High School. 

The interpretation of the data was done by var- 
ious means. Very little of the data could be 
quantified or measured by existent mathematical 
and statistical methods. For example, resis- 
tance to an idea or toaperson might be expressed 
in many ways: facial expressions, verbal re- 
sponse or withdrawal, bodily movement, degrees 
of hostility or enthusiasm, etc. For these kinds 
of data, direct observation and subsequent inter- 
pretation by the coordinator were utilized. 

The Anecdotal Records of Group Meetings 
were interpreted by presenting a verbatim ac- 
count of each meeting and analyzing the state- 
ments made against the background of the total 
context of that meeting, the total project, the 
participants themselves, and the ‘‘natural social 
group’’ factors in a field situation. Also, these 
meetings included and were influenced by activi- 
ties of the participants: discussion, planning, pre- 
senting reports, giving research findings, etc. 
What was said in the meetings was frequently com- 
pared to what was actually done. 3 

Comparison, then, was a useful tool in the in- 
terpretation of data. Various kinds of data were 
compared. Some examples of these include the 
following: 


1. Comparison of each member’s ‘‘Statement 
of Personal-Professional Purpose for Participa- 
tion in the Project’’ (written earlyin the study) 
with ‘‘Description of Individual Research Pro- 
jects’’ and ‘‘Self-Evaluation Statements’’ (the lat- 
ter two statements made at the end of the study). 

2. As evaluation of the project was constant 
and continuous bimonthly sources of data like 
‘*End-of-Meeting Evaluation Slips’’ and End-of- 
Meeting Consolidation Sheets’’ were compared. 
These were further compared to each member’s 
behavior in the group and to his reported re- 
search and teaching activities between meetings. 


44 


3. Oral and written reports by individual 
members were compared to their original “Prob- 
lem Selection” and “Statement of Purpose”. 

4. Each member of the group was asked to 
answer, in writing, the following questions after 
the first half of the study was completed: 


‘‘What data, ideas, opinions or impres- 
sions have you gained from a) the particu- 
lar project you selected, and b) what effect 
have the study and the conferences had up- 


on your approach to your teaching prob- 
lems?’’ 


The responses to these questions were alsocom- 
pared to items 1-3 above. 


5. Data on the group’s power structure were 
obtained by an ‘“‘Opinionaire’’,4 a projective 
type instrument. The first time the members 
completed this instrument they were asked to re- 
spond as ‘‘a typical member of the group’’. Ev- 
idence showed they had not answered as a ‘‘typi- 
cal’? member but in reality had projected their 
own personal opinions. At the request of the 
members the instrument was presented a second 
time, two weeks later. On this occasion each 
member requested to be allowed to answer with 
his own opinion, not that of a hypothetical ‘‘typi- 
cal’? member. After completion ofthese opinion- 
aires (using the same items), comparison was 
made to the first responses. 


There were 36 questions in each opinionaire. 
Each of the nine teacher members completed 
each opinionaire fully on both occasions. Inonly 
nine instances of a total of 648 responses were 
there any absolute changes in response. None 
of these differences were statistically signifi- 
cant at the five percent level. 

Interpretation of data was, in summary, ac- 
complished by direct observation, nonquantified 
content analysis of written materials, compari- 
sons among a variety of written material, be- 
tween written material and verbal responses 
and/or behavior in the group as well as between 
meetings. Further, analysis of writtenand verb- 
al data was compared to data on the action level. 
Emotional responses, attitudes and values, as 
well as skills were singly and as a highly inter- 
related group constantly scrutinized for patterns 
of change and growth as well as for isolation of 
particular instances of change. 


SEC TION IV 
Findings of the Study 


EARLIER IN this report, five major hypoth- 
eses were listed. These hypotheses are restat- 
ed here. The results of the testing of each hy- 


JOURNAL OF EXPERIMENTAL EDUCATION 


pothesis follow each one of them respectively. 


Hypothesis A: There will be tangible modifca- 
tions in classroom practices of the teachers 
invoived in the study. 


Evidence Relating to Hypothesis A— For the 
greater part of the study there were nine teachers 
plus a teaching department head involved in this 
study. Based on reports (oral and written) of 
each of these participants, the evidence of tangi- 
ble modifications is clear-cut and definite in seven 
of the ten participants’ classrooms. In the case 
of the eighth member, changes were less than the 
writer had expected while in the classrooms of the 
ninth and tenth members changes were very few 
or none. 

Some examples of modifications were the fol- 


lowing: 


1. The initiation of English-social studies core 
classes in the classrooms of two of the 
members. 

2. A systematized, extensive penmanship pro- 
ject in the classroom of one of the members. 

3. Re-evaluation of the purposes of testing chil- 
dren on the part of a member. 

4. Utilization of personality inventories or 
problem checklists. 

5. Planning, developing, recording and evalu- 
ating a ‘‘new’’ spelling program. 

6. Examination of pupils’ reading comprehen- 
sion abilities and subsequent adjustment of 
the reading program. 

7. Re-organization of classroom management 
procedures leading to greater efficiency 
and more ‘‘teaching time’’. 


Hypothesis B: A variety of instructional methods 
be tried and tested. 


Evidence Relating to Hypothesis B—in addition 
to the’ evidence cited under Hypothesis A above, 
it should be noted that Hypothesis B was also, in 
the main, supported. 


1. The department head worked with two stu- 
dent teachers and developed the unit plan 
approach. 

2. Another member tried two different meth- 
ods of teaching composition skills. 

3. A third member tried and tested a method 
of improving use of the dictionary. 

4. Still another applied ‘‘phonics’’ principles 
to reading and writing skills andcompared 
these to a method where phonics were vir- 
tually unmentioned to the pupils. 

5. Another member switched from the arbi- 

trary teaching of formal grammar to a 

method based on greater ‘‘individual anal- 

ysis’’ and a way of teaching the ‘“‘most 


RASSCHAERT 45 


necessary skills’’. 


Hypothesis C: There will be an increase in the 
confidence of teachers in defining problems. 


Evidence Relating to Hypothesis C—There 
was a gradual and, at times, almost impercep- 
tible increase in confidence of teachers in defin- 
ing problems. For many weeks and months the 
teachers kept examining methods, materials, 
techniques, classroom load and other problems 
external to them. From this they moved to de- 
fining problems in the children—tneir growth, 
personality, learning and cultural problems. 

Then the teachers began facing problems re- 
lated to teacher-administrator relations. Some 
of the participants actually got to a point where 
they were examining themselves, analyzing their 
Own motives, values and attitudes. 


Hypothesis D: Teachers will feel inc reasingly 
secure in exchanging suggestions with each 
other. 


Evidence Relating to Hypothesis D—One has 
but to examine the Minutes of Group Meetings or 
the statements made in the Progress Report and 
the Final Report of the group to trace the in- 
crease of candidness among the members. M7 
and M5 as well as M3 and M@9 stated explicitly on 
more than one occasion that knowing the other 
teachers were facing the same problems as they 
themsclves were facing made them feel more 
free to ask for help and suggestions. 

During group meetings, especially during 
Phases Three and Four, members did not hesi- 
tate to say, ‘‘Why don’t you try this?’’ or “‘I’ve 
used this technique and it worked under some 
conditions. Why don’t you give it a try?’’ 

Also, as time went on, the principal, super- 
visor and coordinator felt more free to suggest 
techniques and methods. The department head, 
because of her closeness to the teacher - mem - 
bers, was able to offer suggestions tothe group 
from the very beginning. The pattern of re- 
sponse to her suggestions changed as the mem- 
bers became more ‘‘group’’ oriented. 


Hypothesis E: The Head of the English Depart- 
ment and the Principal will allow and encour- 


age teachers to try, test, and develop newer 
methods, techniques, and courses. 


Evidence Relating to Hypothesis E—Minutes 
of group meetings show that the principal repeat- 
edly encouraged teachers as stated in Hy pothe- 
sis E. 

It was due to his support that the department 
head later worked core into the English curricu- 
lum. - The principal supported most suggestions 
made by members of the group and helpedthe 


members try different methods. The writer be- 
lieves that the principal’s contributions were the 
most beneficial ones coming from an administra- 
tor involved in the project and his positive atti- 
tude did contribute a great deal to the life and 
value of the study. 

The evidence for Hypothesis E seems conc 1u- 
sive to the writer as far as the principal is con- 
cerned, The evidence, on the other hand, toshow 
that the department head ‘‘allowed and encour- 
aged teachers to try, test, and develop newer 
methods, techniques, and courses’’ is inc onclu- 
Sive at this time. 

There is one thing, however, which this study 
did demonstrate most pointedly. It is related to 
the role of the department head (inthis study) but, 
more specifically, to members’ use of research 
materials. 

Only twice during this entire study did a mem- 
ber actually utilize the research summaries (find- 
ings of experts, resource materials, or reports 
of research in progress) that were brought in to 
the group by the coordinator. In each of the two 
instances that such material was used it was by 
the same member. On both occasions the mater- 
ial related to penmanship. On the other hand, 
never during the entire forty-week period did any 
member give indication that he ever utilized the 
hectugraphed reprints of research findings (as 
prepared by the coordinator and the group’s sec- 
retary. As far as the evidence of this study 
shows, such research materials, made easily 
and conveniently available to all members of the 
group, did not affect or modify the teaching prac- 
tices of any group member. 

Because the department head was strongly 
‘‘for’’ the practice of inviting experts in to ad- 
dress the group, it seems significant to the writ- 
er that the department head as well as the other 
group members never took advantage of research 
summaries and articlesinprintedform. It might 
well be that the opinions or findings of experts as 
presented tothe group in this study were actually 
as ineffectual as the written research materials 
which the members were given. Again, this kind 
of teacher-reactionseemstobear out the two 
major assumptions made at the outset of this 


study. 


Three Questions Related to the Five 
Major Hypotheses 


1. What are the strong points of the action 
research method and the work-group-con- 
ference technique? 


In the Urban study the most outstanding contri- 
bution of the action research method and employ- 
ment of the work-group-conference technique was 
that classroom instruction was improved. The 
improvement varied from teacher to teacher 


46 JOURNAL OF EXPERIMENTAL EDUCATION 


over the thirty-eight week period but all of the 
teachers stated that the project had afforded 
them opportunities to improve classroom instruc- 
tion. Some examples of teachers’ statements 
are presented below: 


a) ‘‘My opportunity to teach core came as a 
direct result of our meetings. ”’ 

b) ‘*‘My study was on Remedial Penmanship. 
It will culminate in a new Handwriting 
Scale. It has solved for me the question 
‘How can I improve the penmanship of my 
pupils?’ and has answered questions of 
many years’ standing. ”’ 

c) ‘‘I have found that many of the ideas and 
methods which I have used for the last 
eight years are basically sound. The con- 
ferences have motivated me, vexed me, 
and defeated my tendency toward laziness 
in educational theory, My obsession of be- 
ing a scholar has given way toone of being 
an outstanding teacher. ’’ 

d) ‘‘I have gleaned many techniques (of in- 
struction) in the past year from our discus- 
sions that probably would have taken years 
to discover by myself, if ever...I was be- 
ginning to think that my own teaching situ- 
ation consisted of the four walls of my 
classroom but the discussions caused me 
to realize more forcefully that education 
is a process of the whole school. ’”’ 

e) ‘*...1 incorporated ideas heard at meetings 
in my teaching...the conferences de- 
veloped a liberal attitude within me to ex- 
periment and find better teaching methods; 
this makes a better teacher. ”’ 


Besides the explicit statements of the teach- 
ers regarding improved teaching in their class- 
rooms the Progress Reports, Minutes of Group 
Meetings, Consolidation of End-of-Meeting Eval- 
uation Slips, and the Final Report all show that 
teachers were thinking about ways to improve 
classroom instruction. The experiments de- 
scribed in the Final Report reflect much thought, 
work, andeffortonthe part of mostteachers but 
the writer believes that the Minutes of Group 
Meetings reveal a much wider and deeper kind 
of effort and growth. In the latter records can 
be found evidence of teacher-growth during the 
entire study project period in broadened vie w- 
points, more scientific approach to problems, 
more social insight, greater personal interest 
in teaching problems, an increased desire to 
look at children as individuals, and a constantly 
developing awareness that their teaching could 
be improved. In cases where teacher- growth 
was less than expected, it seemed to the writer © 
that many causes could have been operative. 
Among these was the specific problem of ‘‘un- 
favorable group conditions’’. 


2. Under what conditions can these methods 
best be used in creating curriculum change? 


In the original report of this study, it was 
pointed out that certain conditions are necessary 
for 1) agroupto develop and operate harmonious- 
ly, and 2) people in a group to be stimulate to- 
ward creative participation. Good channels and 
methods of communication are discussedinthe 
report and also the importance of key people. In 
the Urban study the single most important condi- 
tion which was necessary for the success of the 
project was that of ‘‘support’’. Assuming that 
the curriculum of a given junior high school can 
profit from the concerted effort of a number of 
teachers working at it cooperatively, the evidence 
of the Urban study indicates that is of utmost im- 
portance that: 


a) The department head (in a departmentalized 
junior high school) is fully and completely 
in accord with the idea of trying to do such 
a project. This kind of project will, in all 
likelihood, not succeed unless this support 
is constant. 

b) The principal must not merely ‘‘allow’’ it 
but be active in lending it support by giving 
it the added aura of his prestige and active- 
ly serving it by backing teachers’ decisions 
relative to curricular improvement. Hecan, 
furthermore, be of greater service by par- 
ticipating in group sessions when his pres- 
ence will be a positive force and his contri- 
butions (valuable due to experience and spe- 
cial knowledge) will enhance the work of 
the group. 

c) The supervisor, like the principal and de- 
partment head must accept and support the 
idea of action research. In order of prior- 
ity of value to the project (influence, status, 
decision-making), the writer feels the rank- 
ing is department head, principal and, last- 
ly, supervisor. 

d) Enough time must be allowed. This means 
that some teacher-release time should be 
made available and also that the total length 
of the project be allotted a time perioa ap- 
propriate to the growth and development of 
the group of participants. To ‘‘cut off’’ the 
project before the group has completed its 
whole job might mean the negation of much 
that has gone before—the work, time, ef- 
fort, and even, in some cases, isolated 
‘*islands of success” in the ‘‘imp roving’’ 
curriculum as well as human relationwise. 
A premature forced stop might well be one 
of the worst things that could happen. 


3. What are the limitations of the methods and 
application of action researchandthe work- 
group-conference technique? 


RASSCHAERT 


The Urban study strongly indicated that initial 
and continuing support of the project by the prin- 
cipal and department head was essential. With- 
out their initial support the study could not have 
begun. Weaknesses and snags in the study usu- 
ally occurred either directly or indirectly as a 
result of a decrease of support by either mem- 
ber. On the other hand, when strong support 
was forthcoming the morale and production of 
the group increased. The limitations of action 
research and the work-group-conference tech- 
nique in this study came about through lack of ad- 
equate support. This lack of support manifested 
itself in several ways and these comprised the 
limitations of the methods and their application. 


a) The Urban project received no financial 
support either from the school fund, the 
English Department fund, or from the sys- 
tem-wide treasury. 

b) Teachers received no release-time for 
meetings of the group. Everything was 
done on their own time after school. 

The coordinator and group secretary (a vol- 
unteer for the job) did all recording of min- 
utes, transcribing, typing, andhectograph- 
ing. In addition, many letters, brochures, 
reprinted resource materials, and other 
correspondence related to the group were 
typed, hectographed, and mailed by the co- 
ordinator and group secretary. 

e) Although there is undisputable evidence that 
the majority of the group was in favor of 
continuing the study for a third semester, 
this opportunity was lost. 


Another ‘‘limitation’’ of action researchas it 
was done in this study is that no concise re- 
search design can be formulatedlong in advance 
of initiation of the project. The design evolves 
from the ongoing project. This is certainly an 
initial limitation but it need not be a continuing 
detriment throughout the life-span of a project. 
It is, rather, a ‘“‘developmental’’ characteristic 
of the method. If the group and the individuals 
in it proceed in a typical fashion, designs should 
become increasingly sharper and more sophisti- 
cated. At Urban some teachers moved from what 
we might characterize as the lowest points on a 
“research competence”’ scale to relatively high 
points on the scale in less than forty weeks. 


Conclusion 


The study at Urban Junior High in Detroit 


47 


was an honest attempt to test, in a field situa- 
tion, the effectiveness of the work-group-c on - 
ference method as a technique in curriculum 
change and improvement. Over a forty-week 
period the writer saw the teachers’ values, atti- 
tudes and skills relative to education change 
enough to permit acceptance and development of a 
democratic, research-oriented way of doing 
things. 

It would be unrealistic to say that ‘‘great’’ cur- 
ricular changes occurred because of the project, 
but, on the other hand, it was never our expecta- 
tion that changes should or would be of great mag- 
nitude. The terminal point of our study should 
have been, we believe, the ‘‘new’’ point of de- 
parture for the teacher group in further explora- 
tion of the curriculum and themselves. The in- 
sights and skills acquired by each teacher, how- 
ever, during the study may benefit some future 
group in advancing research and cooperative 
action. 


REFERENCES 


Association for Supervision and Curriculum 
Development. 100 Years of Curriculum Im- 
provement, 1857-1957 (Washington, D.C.: 
National Education Association, 1957). 


Benne, Kenneth D., and Muntyan, Bozidar 
(Eds.). Human Relations in Curriculum 
Change (New York: The Dryden Press, 1951). 


Boye, Charles L. (Material presented in lec- 
ture, College of Education, Wayne State Uni- 
versity, 1952.) 


. Corey, Stephen M. Action Research to Improve 
School Practices (New York: Bureau of Pub- 
lications, Teachers College, Columbia Uni- 
versity, 1953). 


Lewin, Kurt. Resolving Social Conflict (New 
York: Harper and Brothers, 1948). 


Meier, Arnold R., and others. A Curriculum 


for Citizenship: A Report of the Citizens hip 
Education Study (Detroit: Wayne State Univer- 


sity Press, 1952). 


Thelan, Herbert, and Dickerman, Watson. 
**Stereotypes and Growth of Groups, ’’ Educa- 
tional Leadership, VI (February 1949), pp. 
309-16. 


(See footnotes next page) 


2. 
4 
7. 


JOURNAL OF EXPERIMENTAL EDUCATION 


FOOTNOTES 


. Examples of all the data- gathering instru- 
ments are shown in the Appendix of the origi- 
nal dissertation. 


2. Within the limits of this report it is possible 
to give only brief samples of how data were 
interpreted. For complete examination of da- 
* ta analysis see the original dissertation, 
Chapters IV-VII and Appendix, pp. 351-451. 


3. There is much evidence in the group’s 75 


page Final Report to indicate that all mem- 
bers moved definitely to the action level. An 


abstract of the Final Report is in the Appen- 
dix of the original Eaaettalica. 


. See Chapter VII, pp. 282-95, for a-com- 


plete analysis and Appendix of disserta- 
tion. 


. 
48 
' 


JOURNAL OF EXPERIMENTAL EDUCATION 
(Volume 27, September 1958) 


LITERAL AND CRITICAL READING 
IN SOCIAL STUDIES 


E. ELONA SOCHOR 
Temple University 


The Problem and Its Scope 


THE PURPOSE of this study was to investi- 
gate certain aspects of reading comprehension 
among fifth-grade pupils. In order to explore 
the problem, it was necessary to construct and 
validate an intermediate-grade reading test in 
social studies. The specific problems consid- 
ered were: 


1. What is the relationship between verbal intel- 
ligence and 
a. ‘‘General’’ reading ability? 
b. Achievement in literal reading compre- 
hension in social studies? 
c. Achievement in critical reading com- 
prehension in social studies? 


2. What is the relationship between ‘‘general’’ 
reading ability and 
a. The ability to comprehend literally in 
social studies? 
b. The ability to comprehend critically in 
social studies? 


3. What is the relationship between proficiency 
in literal and critical interpretation of social 
studies? 


4. What is the relationship between proficiency 
in each selected critical reading skill and 
the ability to comprehend literally in soc ial 
studies? 


Justification of the Study 


To date, the concepts of the measurement 
and development of reading comprehension as 
held in the 1920’s are still widely evident at all 
educational levels. Reading tests, largely lim- 
ited to the appraisal of ‘‘sense-meaning’’ and 
weighted with materials from the field of litera- 
ture, are used commonly to determine all read- 
ing needs. Little attention is being given to 
critical reading skills in study situations. In 
practice, reading tends to remain a ‘‘unitary 
ability. ’’ 


Such a concept of reading ability isno longer 
tenable. Conclusive data from studies by Dewey 
(11), Tyler (32), Thorndike (30), and Davis (10) 
indicate that adequate reading comprehension ne- 
cessitates not one but several levels of mental 
functioning. Although this premise is well estab- 
lished, much research still needs to be conduc- 
ted on the more specific aspects of reading com- 
prehension. 

Moreover, the skills and abilities characte r- 
istic of effective reading interpretation are not 
the same in all content areas. The specificity 
of skills within subject-matter areas at the sec- 
ondary and college levels has been substantiated 
by Shores (27) and Humber (18). Further data 
are needed on the nature of these skills in each 
content area, particularly at the elementary 
school level. 

Another major reading problem in education 
today is verbalism. Too much emphasis has 
been placed upon a low-level type of interpreta- 
tion. Retardation in reading has been deter- 
mined too frequently in terms of wordperception 
alone. As early as 1921, investigators reported 
deficiencies in the reading comprehension of 
some school children. Although these children 
could reproduce what they had read with a ‘‘par- 
rot-like precision, ’’ they had little real under- 
Standing. Since then, many educators and re- 
search workers have stressed the importance of 
this problem (4,17,25). Nevertheless, verbal- 
ism in reading still appears to be rampant in 
every phase of school activity. To help resolve 
this situation, measures of appraising and tech- 
niques for developing the various aspects of lit- 
eral and critical interpretation in reading need 
to be investigated. The major solution, however, 
rests with the schools. Effective reading com- 
prehension must be emphasized in all reading ac- 
tivities. 

The importance of reading comprehension is 
not limited to school life. The ability to inter- 
pret what is read on current events is vital to the 
preservation of democracy. In such a socialor- 
der, comprehending what is stated directly isa 
prerequisite. Mere literal interpretation, how- 
ever, is not sufficient. The citizen must be 


* Abstract of Doctor of Education thesis, Teachers College, Temple University. 


50 JOURNAL OF EXPERIMENTAL EDUCATION 


skilled in evaluating critically the wealth of 
available printed materials. 


Limitations of the Study 


Experimental Design—The purpose of this in- 
vestigation was to study the relationships be- 
tween intelligence and three types of reading 
ability: ‘‘general’’ reading, literal inte rpreta- 
tion, and critical interpretation. Final data 
were obtained on a representative sample of five 
hundred thirteen fifth-grade pupils. To obtain 
these data on reading skills in social studies, it 
was necessary to construct and validate the ex- 
perimental edition of a reading test in that con- 
tent area. A group test of verbal intelligence 
and a standardized reading test were used to help 
in estimating the normality of the distribution. 

Statistical Design—The results were ana- 
lyzed with large sample techniques which includ- 
ed the Chi-square test for presence or absence 
of relationship, and the product-moment and 
point-biserial methods of correlation. The read- 
ing test in social studies was evaluated by means 
of one technique estimating test reliability and 
three techniques estimating item validity. 

The Population—A total of six hundred elev- 
en pupils were tested. Complete results ob- 
tained on five hundred thirteen subjects were 
used in the study. 


1. Source: Eighteen fifth-grade classes were 
tested in June 1949. Nine of these were 
from four suburban Philadelphia schools, 
and nine from four urban Philadelphia 
schools. 

. Age: The chronological age range was from 
10-0 through 14-6. 

. Sex: Boys and girls were tested. 

. Race: White and Negro children were includ- 
ed. 

. Intelligence: Verbal intelligence quotients 
ranged from 57 through 158. 

. Reading Grade: The reading grade ranged 
from minus 2.5 through 10. 3. 

. Final Population Criterion: Two hundred 
sixty-nine cases of the final population fell 
within plus or minus one standard devia- 
tion from the mean on the two criterion 
measures—intelligence and reading grade. 


Tests 


The Gates Reading Survey, Form I, Level of 
Comprehension (Published by Teachers 
College, Columbia University), was adminis- 
tered as a power test in ‘‘general’’ reading com- 
prehension. The Experimental Edition of the In- 
termediate Reading Test, Social Studies, was 
used to appraise literal and critical interpreta- 
tion in social studies. The Pintner General 


Ability Test, Form A (published by World Book 
Company), was used as a measure of verbal in- 
telligence. 


Definition of Terms 


The following terminology is basic to this 
study. For purposes of clarity, literal reading 
comprehension and the selected critical reading 
comprehension skills will be illustrated as well 
as defined. In each example, the correct re- 
sponse for the test item will be the first, andone 
of the distractors will be the second. 

‘‘Literal Reading’’ represents the ability to 
obtain a low-level type of interpretation by using 
only the information explicitly stated. For exam- 
ple, the selection states, ‘‘Millions of workers 
dragged stone blocks for the outside walls and 
packed basket after basket of earth between them.’’ 
The test item appraising literal interpretation of 
this sentence is: ‘‘The outside walls were made 
of (1) stone, (2) earth. . .”’ 

“Critical Reading’’ represents the ability to 
obtain a level of interpretation higher than that: 
needed for literal interpretation. In this study 
the following critical reading comprehension 
skills were set up: 


1. Functional Vocabulary tests the reader’s 
background of experience in reference toa 
concept used in the selection. 

2. Semantic Variation of Vocabulary tests the 
reader’s ability to identify a similar usage 
of a given word from the selection. For ex- 
ample, the word ‘‘beat’’ is employed inthis 
manner in a story: ‘‘Every day cruel slave 
drivers beat these workers. . .’’ Thetest 
item is: ‘‘The sentence which uses the word 
beat just as it is used inline 26 of the story 
is (1) Mother said, ‘Beat the rug until itis 
clean.’ (2) The policeman’s beat was sev- 
eral miles long. . .’’ 

3. Central Theme tests the ability to distin- 
guish the central topic of the selectionfrom 
subordinate ones. An example is: ‘‘This 
story as a whole is about (1) the largest 
wall in the world, (2) the early emperor of 
..” 

4. Key Idea tests the ability to indentify the key, 
or most important, idea in the story. One 
test item is: ‘‘The most important idea in 
the story is that the Great Wall was (1) act- 
ing like an army, (2) used as a highway. .’’ 

5. Inference tests the ability to draw a specific 
conclusion indirectly from the material 
given. For example, the first selection 
discusses the need for the Great Wall and 
then states, ‘‘It was longer than 1500 
miles, more than half the distance across 
our own country.’’ The test item is: ‘‘The 
Emperor of China needed the Great Wall 


SOC HOR 


because China (1) was too large to protect 
with soldiers, (2) had only a few soldiers 
who rode on horseback. . .’’ 

. Generalization tests the ability to identify a 
general conclusion or principle indirectly 
from information implicitly stated. An ex- 
ample is: ‘‘From the story we should be- 
lieve that ALL (1) buildings thatlast have 
been built carefully, (2) workers of China 
are better than the workers in our country 


- Problem Solving tests the ability to apply 
information from the selection to a prob- 
lematic situation. One test item is: ‘‘Mrs. 
Brown paid twenty-five cents for a can of 
peaches. She said, ‘This is howthe farm- 
er gets rich.’ She was wrong because (1) 
the farmer gets only a part of the twent y- 
five cents, (2) farmers get richfrom dairy 
products. . .’’ 

. Association of Ideas tests the ability to see 
the relationship among ideas in a series. 
For example, ‘‘The row with ideas from 
the story that belong together is (1) fierce, 
cruel, savage; (2) enemy, builders, horses, 


. Analogy tests the ability to perceive rela- 
tionship between two pairs of ideas. The 
idea which completes an established rela- 
tionship is identified: ‘‘Stones are to build- 
ing as people are to (1) nation, (2) houses 


. Antecedent tests the ability to recognize the 
word or words to which a selected pronoun 
refers. For example, ‘‘The word them 
in line 25 of the story refers to (1) outside 
walls, (2) people of China. . .”’ 

. Sequence tests the ability to determine a 
time sequence. One test item: ‘‘Below is 
a story about how certain vegetables reach 
the store. The first idea out of order is 
(1) The vegetables are canned, (2) the veg- 
etables are processed. . .’’ 

. Extraneous Idea tests the ability to deter- 
mine relevancy of ideas to a particular se- 
lection. For example, ‘‘The idea NOT 
found in the story is that many(1) people 
were buried in the Great Wall, (2) emper- 
ors used the wall for protection. . .’’ 

. Author Purpose tests the ability to identify 
the author’s primary motive in writing a 
given selection. One test item is: ‘‘The 
author wrote this story because he thinks 
we should know (1) about great things in 
other countries, (2) about the enemies of 


‘‘Survey Reading Comprehension’”’ is ameas- 
ure of understanding based on the results of a 
reading test which uses content largely from the 
field of literature. The comprehension section 


of the Gates Reading Survey was used in this 
study. ‘‘General’’ Reading Comprehension is 
used synonymously with ‘‘survey’’ reading com- 
prehension. 

‘‘Verbal Intelligence’’ is a measure of capac- 
ity which is obtained from a test that usually re- 
quires a high degree of language facility both in 
understanding directions and in the subject’s re- 
sponses. 


A Review of Kindred Literature 


Although most of the research on reading com- 
prehension and test construction has been conduct- 
ed at the secondary or college levels, investiga- 
tions at the elementary level tend to confirm the 
conclusions indicated in the research at the high- 
er levels. Accordingly, the pertinent con- 
clusions from all the studies are summar- 
ized interms of two major areas: critical 
reading comprehension andtest con- 
struction. 


Critical Reading Comprehension 


In 1917 Thorndike published three articles 
emphasizing the premise that reading is a think- 
ing process (29, 30,31). Since that time, educa- 
tors have been concerned not only with the ‘‘sense- 
meaning, ’’ or literal comprehension of printed 
material (14,34), but also with a more thorough 
interpretation, or critical comprehension. Crit- 
ical reading comprehension has been defined as 
critical thinking in reading situations (4). 
Critical Thinking. — Since critical thinking is 
basic to critical reading comprehension, a sum- 
mary of the conclusions in the research on crit- 
ical thinking is pertinent to this investigation: 
1. Critical thinking necessitates the function- 
ing of higher level thought processes (3,30). 

2. Critical thinking appears to be a complex 
of component abilities, some of which seem 
to have been identified (3, 12,13, 32). 

. The manifestation of intelligence does not 
guarantee the ability to think critically (3, 
13). 

. The ability to think critically in one content 
area cannot be assumed to indicate that the 
same is true in another (25). 

. Aspects of critical thinking can be meas- 
ured by paper-and-pencil tests (13,32,33). 

. Certain aspects of critical thinking can be 
improved by instruction (13, 26, 33). 

. Fifth-grade children can think critically. 
Moreover, the difference between their 
ability to reason and that of adults is mere- 
ly a quantitative one (8, 16, 21). 


Critical Thinking in Reading Situations—T he 
research on critical reading comprehension re- 


52 JOURNAL OF EXPERIMENTAL EDUCATION 


veals: 


1. Critical reading comprehension has the same 
attributes as those stated above for critical 
thinking, but they apply when critical thinking 
is done in reading situations (4, 10,11, 12, 13, 
26, 30, 31). 

. Literal reading comprehension appears to ne- 
cessitate mental functioning of a lower level 
than critical reading comprehension (3, 11, 14, 
32, 34). 

The ability to comprehend critically cannot be 
predicted from the ability to comprehend liter- 
ally, or factually (3, 11,12, 25, 32). 


Test Construction 


The need for better test measures at the ele- 
mentary level has been stressed repeatedly inthe 
literature (4,7,19). The following list of charac- 
teristics includes the major suggestions from per- 
tinent literature. 

Readability—In constructing a test, the author 
should consider the two aspects of readability (4, 
9, 22): 

1. The reader - his experience, interests and 

language facility, 

2. The material - the interest level, the lan- 

guage, the concepts, and the mechanical 
features. 


Reading in the Content Areas—This aspect of 
reading has significant implications for test con- 
struction: 

1. Since reading skills vary between content 
areas and success in reading the materials 
of one content area cannot be used as a cri- 
terion of success in another content area, 
test materials should be built from mater- 
ials within a given content area (6,7, 18, 27). 
Since reading is a complex of many abili- 
ties and skills which vary within one con- 
tent area, specific skills should be ap- 
praised (3, 10,11, 12, 18, 27). 

Since ability in critical reading comprehen- 
sion cannot be predicted from ability in lit- 
eral comprehension, a reading test should 

include both (2, 4, 15, 20, 24, 25). 


Mechanical Features— The following criteria 
are suggested in the literature for test construc- 
tion (1, 7, 28): 

1. The test materials should be valid and reli- 

able. 

2. The number of items appraising each skill 
should be large enough to show the degree 
to which the subject possesses that skill. 

3. The directions should be clear and consist- 
ent for each administration of the test. 

4. Each multiple-choice item should have: (1) 
at least five alternate responses, (2) one 


best answer, (3) the correct answer ran- 
domized, and (4) plausible distractors of 
about equal length. 


Summary of Procedure 


The following procedure was used in this 
study: 


1. Apreliminary edition of The Intermediate 
Reading Test: Social Studies, designed to ap- 
praise both literal interpretation and specif- 
ic critical reading comprehension skills, was 
constructed and validated. 

A preliminary study was conducted in which 

the test was administered to one hundred and 

forty-three children in grades four, five, 
and six. The results were used to evaluate 
the preliminary edition in terms of readabil- 
ity and the discriminating power and internal 
consistency of each test item. 

The measure was revised and called the ex- 

perimental edition of The Intermediate Read- 

ing Test: Social Studies. 

The experimental edition of the reading test 

was administered to five hundredandthirteen . 

children not included in the preliminary 

study. 

a. The reliability of the experimental edition 
was computed by means of the Kuder- 
Richardson Estimate of Test Reliability. 

b. The validity of each test item was evaluat- 
ed by using (1) the Standard Error ofthe 
Difference Between Proportions, (2) anes- 
timate of the product-moment coefficient 
of correlation based on the upper and low- 
er 27% of the distribution, and (3) inspec- 
tion of the total number of choices for each 
distractor. 

Two standardized tests were administered to 

the population used in the major study: The 

Gates Reading Survey (Level of Comprehen- 

sion) to appraise ‘‘general’’ reading ability 

and The Pintner General Ability Test (Verb- 
al Series) to obtain verbal intelligence quo- 
tients. 

The product-moment method of correlation 

was used to estimate the degree of relation- 

ship between the four variables: intelligence, 

‘‘general’’ reading ability, literal compre- 

hension in social studies, and critical inter- 

pretation in social studies. 

Partial correlation was used to estimate the 

degree of relationship between the three 

types of reading ability (‘‘general’’ reading 
ability, literal comprehension in social stud- 
ies, and critical interpretation in social stud- 
ies) when intelligence was partialled out. 

Chi-square was used to determine the pres- 

ence or absence of relationship between 1 it- 

eral reading and each critical reading skill 


SOC HOR 


in social studies. 

9. The point-biserial method of correlation was 
utilized to estimate the degree of relationship 
between literal reading and each critical read- 
ing skill in social studies. 


Summary of Results 


Problem I: The degree of relationship betw een 
intelligence and ‘‘general’’ reading ability, lit- 
eral reading comprehension, andcritical read- 
ing interpretation in social studies, as estimat- 
ed by the product-moment method of cor rela- 
tion, was .83 + .01,..72+.02, and .69+ .02 
respectively. 


Problem II: The degree of relationship between 
‘‘general’’ reading ability and literal and crit- 
ical reading comprehension in social studies, 
as estimated by the product-moment form ula, 
was .76 + .02 and .64+.03 respectively. 
When intelligence was partialled out, the cor- 
relation coefficients were .42 and .17 respec- 
tively. 


Problem II: The degree of relationship between 
literal and critical reading interpretation in so- 
cial studies, as estimated by the product-mo- 
ment formula, was .61+.03. With intelli- 
gence controlled, the correlation was . 23. 


Problem IV: The degree of relationship between 
success on each critical reading skill and the 
total literal reading score was computed by 
two methods: 

a. The twenty-three point-biserial coefficients 
of correlation ranged from .45 to -.17 with 
a ‘‘median’’ coefficient of .23 and the seven 
‘‘combined’’ point-biserial coefficients 
ranged from . 28 to .06 with a median of .26 
after those test items which failed to be sig- 
nificant on the chi-square test of signifi- 
cance were excluded. The critical reading 
skill with the greatest degree of relation- 
ship was ‘‘functional vocabulary’’ (.28), the 
skill with the least was ‘‘extraneous idea’’ 
(. 06). 

The estimates of the product-moment coef- 
ficients of correlation ranged from .61 to 
-.27 with a ‘‘median’’ coefficient of . 25. 
Twenty-two of the critical reading test 
items appeared to be significant on the basis 
of three probability values. 


Conclusions 


Within the limitations of this study as stated 
in Chapter I (see original thesis onfile at Temple 
University) the following conclusions appear to 
be valid: 


Problem I 


Verbal intelligence appears to be very highly 
related to ‘‘general’’ reading ability (.83 + 
-01). 

Verbal intelligence appears to be substantial- 
ly related to the ability to comprehend liter- 
ally in social studies (.72 + .02). 

Verbal intelligence appears to be substantial- 
ly. related to the ability to comprehend criti- 
cally in social studies (. 69 + .02). 


Problem II 


‘‘General’’ reading ability appears to be high- 
ly related to literal reading interpretation of 
social studies materials (.76 + .02). When 
intelligence is partialled out, the relationship 
appears to be substantial (. 41 + . 04). 

. “General’’ reading ability appears to be sub- 
stantially related to critical reading interpre- 
tation in social studies (. 64 + .03). When in- 
telligence is held constant, the relations hip 
appears to be low (.17 + . 04). 


Problem Il 


Literal reading comprehension appears to be 
substantially related to critical reading com- 
prehension in social studies (. 61 + .03). With 
intelligence held constant, the relationship ap- 
pears to be negligible (. 23 + . 04). 


Problem IV 
Each selected critical reading skill appears 
to show a negligible or low relationship to the 


ability to comprehend literally in social stud- 
ies, 


General Conclusions 


1. Reading comprehension in social studies ap- 
pears to be a composite of many skills and 
abilities which apparently function at various 
levels of mental activity. 

Literal and critical reading comprehension 
in social studies appear to be relatively inde- 
pendent abilities when intelligence is held 
constant. 

Individual critical reading comprehension 
skills appear to be relatively independent of 
the ability to comprehend literally in social 
studies. 

When intelligence is held constant, critical 
reading comprehension in social studies ap- 
pears to be virtually independent of ‘‘gen- 
eral’’ reading ability; literal reading com- 
prehension, .relatively independent of ‘‘g en- 


54 JOURNAL OF EXPERIMENTAL EDUCATION 


eral’’ reading ability. 

5. Group tests of ‘‘general’’ reading ability and 
group tests of verbal intelligence tend to 
measure common factors. 


Implications 


Several school practices need to be consid- 
ered thoughtfully if reading instruction in social 
studies is to be improved: 

1. The use of a ‘‘general’’ reading test to 

identify all reading needs. 

2. The practice of teaching reading as a ‘‘un- 
itary’’ ability in materials taken from the 
field of literature. 

3. The use of a group, verbal intelligence 
test to estimate the intelligence of all pu- 
pils. 


A reading test appraising ‘‘general’’ reading 
ability does not identify all reading needs. By 
definition, it is ‘‘survey’’ in nature and lacking 
in specificity. Frequently it is limitedtoa low- 
level type of interpretation. Furthermore, the 
usual reading test is composed primarily of ma- 
terials from the field of literature. 

Such a test is inadequate, in the first place, 
because reading comprehension cannot be con- 
fined to the interpretation of the sense-meaning 
in literature materials. Reading is a complex 
process embracing many levels of interpretation 
and many different skills and abilities. 

In the second place, the reading skills and 
abilities necessary to adequate interpretation 
vary considerably within and between the various 
subject-matter fields. Accordingly, specific 
needs in reading comprehension, particularly in 
critical reading comprehension, should be ident- 
ified by means of informally constructed tests 
and daily appraisal during teaching sessions. 

The identification of specific needs in literal 
and critical reading comprehension is merely 
the first step in improving the reading skills of 
any school population. Developmental instruc- 
tion in reading skills and abilities needs to be 
provided systematically in all content areas. 
More emphasis should be placed upon higher 
levels of reading interpretation to avoid verbal- 
ism. The comparison of the literal andcritical 
scores in this study indicates that the pupils 
tested lacked the ability to interpret social 
studies materials critically. It is the responsi- 
bility of each classroom teacher to provide for 
the systematic training required to develop such 
abilities. 

To appraise a retarded reader’s mental ca- 
pacity by means of a group, verbal intelligence 
test is a highly questionable procedure. The 
amount of relationship between the group, verb- 
al intelligence test and the group reading test 
in this study as well as in other studies implies 


that one can be predicted from another with con- 
siderable accuracy. Therefore, a child unable 
to read cannot perform at or near his mental ca- 
pacity level on such an intelligence test. When 
reading retardation is apparent, it is advisable 
to use an individual measure of mental capacity. 

Another major need in education is the con- 
struction of reliable and valid measures to ap- 
praise critical reading skills in all subject mat- 
ter areas at the elementary school level. Criti- 
cal reading skills are not included in content- 
area tests available now. Sucha test should 
yield a score on each critical reading compre- 
hension skill so that specific needs can be ident- 
ified. 


Suggestions for Further Research 


Further inquiry into reading comprehension 
appears to be warranted. The following prob- 
lems are in need of investigation: 

1. The relationships between, and interrela- 
tionships among the critical reading com- 
prehension skills in social studies. 

The investigation of other skills necessi- 
tating a higher level of interpretation in so- 
cial studies. 

A factorial analysis of critical reading com- 
prehension in social studies. 

The investigation of literal and critical 
reading comprehension within other content 
areas and between content areas. 
Investigations on the development of each 
critical reading comprehension skill. 

. Studies to evaluate effective methods 
for developing critical thinking in read- 
ing. 

The construction of valid measures to ap- 
praise literal and critical reading compre- 
hension skills in all content areas at the 
elementary school level. 


BIBLIOGRAPHY 


1. Adkins, Dorothy C., et al. Construction and 
Analysis of Achievement Tests (Was hing- 
ton, C.: Superintendent of Documents, 
U. S. Government Printing Office, 1947). 

Anderson, Howard R., (Editor), ‘‘Teaching 
Critical Thinking in the Social Studies,”’ 
Thirteenth Yearbook of the National Coun- 


Cil for the Social Studies (Washington, D.C. 
National Education Association, 1942). 


Bedell, Ralph Clarion. The Relations hi 
Between the Ability to Recall and the 5 - 
ity to Infer in Specific Learning Situations, 
unpublished Ph. D. Dissertation, Univer- 
sity of Missouri, 1934. 


Betts, Emmett A. ‘‘Guidance in the Criti- 
cal Interpretation of Language,’’ Elemen- 


tary English, XXVII (January 1950), pp. 9- 
ery English 


» 

5. Betts, Emmett A. ‘‘ Readability: Its Appli- 
cation to the Elementary School,’’ Journal 
of Educational Research, XLII (February 
1949), pp. 438-59. 

6. Bond, Eva. Reading and Ninth Grade 
Achievement, Contributions to Education, 
No. 756 (New York: Bureau of Publications, 
Teachers College, Columbia University, 
1938). 

7. Conant, Margaret M. The Construction ofa 
Diagnostic Reading Test, Contributions to 
Education, No. 861 (New York: Bureau of 
Publications,. Teachers College, Columbia 
University, 1942). 

8. Croxton, W. C. ‘‘Pupils’ Ability to Gener- 
alize,’’ School Science and Mathematics, 
XXXVI (June 1936), pp. 627-34. 

9. Dale, Edgar, and Chall, Jeanne S. ‘‘A Form- 
ula for Predicting Readability: Instruc- 
tions, ’’ Educational Research Bulletin, 
XXVI (February 17, 1948), pp. 37-54. 

10. Davis, Frederick B. Fundamental Factors 
of Comprehension in Reading, unpublished 
Ph.D. Dissertation, Harvard University, 
1941. 

11. Dewey, Joseph C. ‘‘ The Acquisition of Facts 
as a Measure of Reading Comprehension,”’ 


Elementary School Journal, XXXV (Janu- 
ary 1935), pp. 346-48 


12. Gans, Roma. A Study of Critical Reading 
Comprehension in the Intermediate Grades, 
Contributions to-Education, No. 811 (New 
York: Bureau of Publications, Teachers 
Teachers College, Columbia University, 
1940). 

13. Glasser, Edward M. An Experiment in the 
Development of Critical Thinking, Contri- 
butions to Education, No. 643 (ew York: 
Bureau of Publications, Teachers College, 
Columbia University, 1941). 

14. Gray, William S. ‘‘Reading and Factors In- 

fluencing Reading Efficiency, in Rese 
in General Education (Washington, D. C.: 
American Council on Education, 1940), pp. 
18-44. 

. Gray, William S.,(Chairman). Report of the 
National Committee on Reading, Twenty- 
Fourth Yearbook of the National Society 
for the Study of Education, Part I (Bloom- 
ington, Ill.: Public School Publishing C o., 
1925). 

16. Hazlitt, Victoria. ‘‘Children’s Thinking,’’ 


British Journal of Psychology, XX, Part 
IV (April 1930), pp. +5481 
17. Horn, Ernest. Methods of Instruction inthe 


Social Studies (New York: Charles Scr ib- 
ner’s Sons, 1937). 


18. Humber, W. J. An Experimental Analysis 
‘of Selected Reading Skills as Related to 


1 


SOC HOR 


19. 


20. 


21. 


22, 


23. 


24. 


25. 


27. 


28. 


29. 


30. 


31. 


32. 


33. 


Certain Content Fields at the University 
of Minnesota, unpublished Ph. D. Disser- 
tation, University of Minnesota, 1942. 

Husband, K. L., and Shores, J. Harlan. 
‘‘Measurement of Reading for Problem 
Solving—A Critical Review of the Litera- 
ture, ’’ Journal of Educational Research, 
XLII (February 1950), pp. 453-65. 

Irion, Theodore W. H. Comprehension Dif- 
ficulties of Ninth Grade Students in the 
Study of Literature, Contributions to Edu- 
Cation, No. 169 (New York: Bureau of Pub- 
lications, Teachers College, Columbia 
University, 1925). 

Long, Louis and Welch, Livingston. ‘‘Rea- 
soning Ability in Young Children,’ Journal 
of Psychology, XII (July 1941), pp. 21-44. 

Lorge, Irving. ‘‘Predicting Readability,’ 
Teachers College Record, XLV (March 
1944), pp. 404-19. 

Maney, Ethei S. Literal and Critical Read- 
ing in Science, unpublished Ed.D. Disser- 
tation, Temple University, 1952. 

McCallister, James M. ‘‘ Reading Difficul- 
ties in Studying Content Subjects,’’ El e- 


mentary School Journal, XXXI (November 
1930), pp. 191-201. 


Morse, Horace T., and McCune, George H. 
Selected Items for the Testing of Social 
Studies, National Council for the Social 
Studies, Bulletin No. 15 (Washington, D.C.: 
National Education Association, 1949). 

Salisbury, Rachel. A Study of the Transfer 
Effects of the Training in Logical Organi- 
zation, unpublished Ph. D. Dissertation, 
University of Wisconsin, 1934. 

Shores, J. Harlan. Reading and Study 
Skills as Related to Comprehension of Sci- 
ence and History in the Ninth Grade, un - 
published Ph. D. Dissertation, University 
of Minnesota, 1940. 

Symonds, Percival M. ‘‘Factors Influenc- 
ing Test Reliability, ’’ Journal of Educa- 
tional XIX (February 1928),.-. 
Pp. 


Thorndike, Edward L. ‘‘The Psychology 
of Thinking in the Case of Reading,’ Psy- 


Review, XXIV (May 1917), pp. 


Thorndike, Edward L. ‘‘Reading as Rea- 
soning: A Study of Mistakes in Paragraph 
Reading, ’’ Journal of Educational Psychol- 
ogy, VIII (June 1917), pp. 323-32. 

Thorndike, Edward L. ‘‘The Understand- 


ing of Sentences,’’ Elementary School 
Journal, XVIII (October 1917), pp. 98- 
Tit. 


Tyler, Ralph W. ‘‘Measuring the Ability to 
Infer,’’ Educational Research Bulletin, IX 
(November 19, 1930), pp. 475-80. 

Wrightstone, J. Wayne. Appraisal of New- 


56 


JOURNAL OF EXPERIMENTAL EDUCATION 


er Elementary School Practices (New 34. Zahner, Louis C. ‘‘Approach to Reading 
ork: Bureau ications, Teach- Through Analysis of Meanings, ’’ Heading 

ers College, Columbia University, in General Education (Washington, D. C.: 

1938). American Council on Education, 1940). 


2 
iil 
\ 
Be 


JOURNAL OF EXPERIMENTAL EDUCATION 
(Volume 27, September 1958) 


LITERAL AND CRITICAL READING 
| IN SCIENCE 


ETHEL S. MANEY 
Temple University 


The Problem and Its Scope 


THE MAJOR purpose of this investigation was 
to determine the relationships between selected 
factors of reading comprehension as revealed by 
fifth-grade children. In order to obtain data on 
this problem, it was necessary to evaluate the 
experimental edition of an intermediate reading 
test in science. Specifically, the problems con- 
sidered were: 


1. What is the relationship between literal 
and critical reading comprehension of science 
materials? 

2. What is the relationship between verbal in- 
telligence and 

a. ‘‘Survey’’ or ‘‘general’’ reading com pr e- 

hension? 

b. Literal reading comprehension in science? 

c. Critical reading comprehension in science? 


3. What is the relationship between reading 
comprehension as measured by a standardized 
reading survey test and that appraised by 

a. A literal reading test in science? 

b. A critical reading test in science? 


4. To what degree does each selected critical 
reading skill tend to be independent of the ability 
to read literally when science materials are 
used? 


Justification of the Study 


Recent investigations have cast doubt upon 
certain traditional school practices: (a) employ- 
ing a single reading test to measure reading abil- 
ity in all situations, and (b) considering the de- 
velopment of critical reading skills as a concom- 
itant of intelligence, maturation, and normal 
school progression. 

One of the assumptions underlying these prac- 
tices supports the unitary concept of reading abil- 


ity, that, that reading ability is so generalized 
that a reader can obtain a commensurate degree 
of interpretation regardless of the content or pur- 
pose. 

Davis (10), Humber (21), Shores (28), and 
Swenson (30) have produced experimental evi - 
dence at the secondary school level which re- 
futes this entity concept of reading. They found 
that (a2) content and purpose dictate the skills to 
be employed in reading a particular selection, 
(b) ability to read in one content area does not 
ensure equal success in another, and(c) ability 
to interpret content literally does not guarantee 
commensurate ability in a higher level of inter- 
pretation. 

There appears to be need for experimental 
data at the elementary school level as well. 
This study is one attempt to provide information 
concerning certain skills used in reading sci - 
ence materials at that level. 


Limitations of the Study 


Experimental Design— This study was primar- 
ily a statistical analysis of reading com prehen- 
sion as exhibited by a representative sampling 
of fifth-grade children. It was also concerned 
with the validation of the experimental edition of 
a reading test in science. To provide additional 
data, a test of verbal intelligence and a stand- 
ardized reading test were administered. 

Statistical Design—The results of the tests 
administered were statistically analyzed by 
large sample procedures which included inter- 
correlation, chi-square, and point bi-serial cor- 
relation. The reading test in science was evalu- 
ated by several measures of item-analysis. 

The Population— Five hundred thirteen sub- 
jects were used in the study. 

1. School Background: All subjects were in 

the last month of the fifth grade during 
June 1949. They were members of unse- 
lected fifth-grade classes from four sub- 


*Abstract of a Doctor of Education dissertation, Teachers College, Temple University, 1952. 


~ 
| 


JOURNAL OF EXPERIMENTAL EDUCATION 


urban Philadelphia and four urban Phila- 
delphia schools. Eighteen different clas- 
ses were represented in the final popula- 
tion, nine urban and nine suburban. 

Age: The chronological age range was 
from 10-0 years to 14-6 years, inclusive. 
Sex: Both sexes were represented in the 
study. 

Intelligence: The verbal intelligence quo- 
tient range for the population was from 57 
to 158, inclusive. 

Reading Grade: The standardized reading 
grade ranged from less than 2.5 through 
10.3. 

Final Population Criteria: Only subjects 
who completed all tests were included 
in the final population. Of the six hundred 
eleven children tested, five hundred thir- 
teen had complete results. Two hundred 
sixty-nine of the final population fell with- 
in plus or minus one standard deviation of 
the mean on both the intelligence and the 
standardized reading tests. 


Tests Administered—To obtain a measure of 
reading comprehension on an untimed power test, 
The Gates Reading Survey, Grades 3 to10, Form 
I, Level of Comprehension (published by the Bur- 
eau of Publications, Teachers College, Col um- 
bia University), was administered. To appraise 
literal and critical reading interpretation of sci- 
ence material, the experimental edition of the 
Intermediate Reading Test: Science was used. 
To measure verbal intelligence, the Pintner 
General Ability Test, Verbal series, Intermed- 
iate, Form A (published by World Book Com- 
pany) was given. 


Terminology 


1. Literal reading is the ability to obtaina 
low-level type of interpretation by using only the 
information explicitly stated. 

2. Critical reading is the ability to obtaina 
level of interpretation higher than that needed 
for literal interpretation. For this study the fol- 
lowing critical reading skills were employed: 


a. Functional Vocabulary tests the reader’s 
background of experience in reference to 
a concept used in the selection. 

b. Semantic Variation of Vocabulary tests the 
reader's ability to identify a similar usage 
of a given word from the selection. 

c. Central Theme tests the ability to distin- 
guish the central topic of the selection 
from subordinate ones. 

d. Key Idea tests the ability to identify the key 
or most important, idea in the story. 

e. Inference tests the ability to draw a specif- 
ic conclusion from facts explicitly stated. 


f. Generalization tests the ability to identify 
a general conclusion or principle from in- 
formation implicitly stated. 

g- Problem Solving tests the ability to apply 
information from the selection to a prob- 

j lematic situation. 

h. Association of Ideas tests the ability tosee 
the relationship among ideas in a series. 

i. Analogy tests the ability to perceive rela- 
tionship between two pairs of ideas. 

j. Antecedent tests the ability to recognize 
the word or words to which a selected pro- 
noun refers. 

k. Sequence tests the ability to determine a 
time sequence. 

1. Extraneous Idea tests the ability todeterm- 
ine relevancy of ideas to a particular se- 
lection. 

m. Following Directions tests the reader’s 
ability to evaluate information as aprelim- 
inary step to executing or rejecting print- 
ed instructions. 

n. Visualization tests the reader’s ability to 
interpret a graphic representation of an 
idea verbally presented in the selection. 


3. ‘‘Survey’’ Reading Comprehension isa 
measure of understanding based on the results 
of a reading test which uses content largely from 
the field of literature. For this study, the com- 
prehension section of the Gates Survey was used. 

4. ‘‘General’’ Reading Comprehension is used 
synonymously with ‘‘Survey’’ Reading Compre- 
hension in this study. 


A Review of Kindred Literature 


Few studies have been reported that are di- 
rectly relevant to a study of the reading compre- 
hension of fifth-grade children. Nevertheless, 
the conclusions from many other studies as well 
as the opinions of recognized authorities have 
contributed to the assumptions upon which this 
study was based. Accordingly, they are includ- 
ed in this survey of kindred literature which is 
here summarized in terms of the major conclu- 
sions reported. 


Reading asa Thinking Process 


The following assumptions concerning read- 
ing as a thinking process appear to be supported 
by the literature reviewed (32, 33, 34, 5, 26): 


1. Reading and thinking are inseparable pro- 
cesses: There is no reading without think- 


ing. 

2. Critical thinking is that type of thought 
which utilizes the higher mental proces- 
ses. When critical thinking is used inthe 
reading situation, the process is called 


‘‘critical reading’’. 


Development and Appraisal of Critical Thinking 


A review of the research has tended to con- 
firm the following assumptions: 


1. Critical thinking has a number of compon- | 


ents (14, 25). 

2. The elementary mechanism essential to 
critical thinking develops gradually and is 
usually present in the individual by the age 
of seven (6, 8, 19). 

Children observe the same general patterns 
of thinking as adults but are limited in 
reaching an equal degree of ability by their 
lack of experience (18). 
Growth in certain components assumed to 
be inherent in critical thinking can be af- 
fected by instruction (12, 14, 25). 
Critical thinking abilities and those meas- 
ured by intelligence tests are not identical 
(14, 25). 

. Certain critical thinking abilities can be 
measured reliably and validly by paper -and- 
pencil tests (13, 14, 25). 


Subjective Analysis of Reading Comprehension 
Skills 


Four major approaches were made subjective- 
ly to the study of reading comprehension skills: 


The skills used by good readers were ana- 
lyzed (3, 15). 
The errors and difficulties exhibited by 
poor readers were examined (20, 4). 

. The reading task was structured (15, 23). 
The skills specific to each content area 
were determined (15, 23). 


The findings from these investigations re- 
vealed that: 


1. A knowledge of many skills and abilities is 
essential to successful reading. 

2. Reading skills have been organized interms 
of reader purpose and of depth of compre- 
hension. 

3. Reading skills and abilities needed in se v- 
eral content areas exhibit certain similar- 
ities but, on the whole, tend to be specific 
to each content area, 


Experimental Analysis of Reading Comprehen- 
sion 


The evidence yielded by the experimental 
studies of reading comprehension seem to sup- 
port the following major assumptions: 


59 


Reading is a composite of many compon- 
ents and skills, many of which can be iso- 
lated, identified, and manipulated to serve 
in specific reading situations (10, 14). 
Reading is essentially a thinking process 
and requires for effectiveness all the ele- 
ments needed in critical thinking (10, 14, 
26). 

The ability to read a passage with literal 
interpretation does not guarantee the abil- 
ity to interpret that selection critically (2, 
10,11, 13, 35). 

. The ability to read successfully in one con- 
tent area does not ensure equal success in 
another (21, 23, 28). 

. Certain reading and thinking skills are 
responsive to training (14, 27). 

In order to obtain a valid measure of read- 
ing ability in a given content area, selec- 
tions from that field must be used (17, 30). 

. The concept of ‘‘general reading ability’’ 
is not supported by scientific evidence (10, 
28). 


Test Construction 


A review of the literature on test construc- 
tion led to the following conclusions: 


Most standardized reading tests are de - 
signed to measure the literal rather than 
the critical interpretation of the printed 
material.(5, 10, 13). 

To date, in the research on test construc- 
tion, reading and thinking have been treat- 
ed as separate entities (5,10). 

No standardized reading tests are now 
available at the elementary school level 
which measure children’s ability to think 
critically about printed materials (24). 
Test content should resemble as closely as 
possible the major features exhibited by 
the instructional materials of the content 
area being measured (17, 30). 

The structural characteristics of the test 
must be well controlled (7). 

The major objective factors which influ- 
ence the readability of printed materials 
include those of (a) familiarity of voc abu- 
lary, (b) average sentence length, and (c) 
complexity of sentence structure (9, 16). 


Summary of Procedure 


A summary of the procedure followed in this 
study is outlined below: 


1. Constructed the preliminary edition of The 
Intermediate Reading Test—Science in order 
to obtain a measurement of literal and crit- 


MANEY = 
1 
3 
4 
5 
6 
7 
1 2. 
2 
: 
3 
4 
4, 
5. 
6. 


ical reading achievement in science. 

2. Appraised the reliability of this edition by 
administering it to 143 children of grades 
four, five, and six in a preliminary study. 
Used item analysis on the test results. 


vised edition as the experimental edition of 


ical reading test item by: 

a. Inspecting the responses of the total pop- 

ulation. 

Computing the Standard Errors of the Dif- 

ference Between Proportions with the 

scores of the ‘‘good’’ and the ‘‘poor’’ 
readers (the upper and lower 27% of the 
population). 

Estimating the Pearson Product-Moment 

correlations between achievement in lit- 

eral reading and in each critical reading 
skill, using the upper and lower 27% of 
the distribution. 

8. Computed the Pearson Product-Moment in- 
tercorrelations among the test variables: 
literal reading achievement in science, crit- 
ical reading achievement in science, ‘‘gen- 
eral’’ reading achievement, and verbal intel- 


b 


. 


9. Investigated the presence or absence of the 
relationship between achievement in literal 
reading and the passing and failing on each 
critical reading test item by employing the 
Chi-Square test of significance. 

10. Determined the Point-Biserial correlation 
between achievement on total scores of the 
literal reading section of the experimental 
edition and achievement on each critical 


reading skill. 
Summary of Results 


Problem I: Literal and Critical Reading Compre- 
hension 


1. The Pearson Product-Moment formula 


JOURNAL OF EXPERIMENTAL EDUCATION 


yielded a correlation of .67 + .02 between 


_ literal and critical reading comprehension 
‘in science. With intelligence held constant, 
_ the correlation was . 34. 


3. Revised the test to serve the major purposes Problem II: ‘‘Intelligence’’ and Reading Compre- 
of the study. Thereafter referred to the re- hension 


The Intermediate Reading Test—Science. 2. The Pearson Product-Moment formula 
4. Administered the experimental edition to a yielded a correlation of .83 + .01 between 
different population consisting of 513 fifth- the Pintner Intelligence Test (Verbal) and 
grade children. the Gates Reading Survey (Level of Com- 
5. Administered to the same population two prehension). ; 
standardized tests: 3. The Pearson Product-Moment formula 
a. Gates Reading Survey (Level of Compre- yielded a correlation of .75 + .02 between 
hension) to determine the ‘‘general’’ read- the Pintner Intelligence Test (Verbal) and 
ing ability. the literal reading section of the Intermed- 
b. The Pintner General Ability Test (Verb- iate Reading Test—Science. 
al Series) to obtain an index of verbal in- 4. The Pearson Product-Moment formula 
telligence. q yielded a correlation of . 67 + .02 between 
6. Estimated the reliability of the experiment- the Pintner Intelligence Test (Verbal) and 
al edition of The Intermediate Reading Test- the critical reading section of the Inter- 
Science by using the Kuder-Richardson mediate Reading Test—Science. 
formula with the data obtained on the total 
population. Problem III: ‘‘General’’ Reading Comprehension 
7. Studied the reliability of each literal andcrit- and Literal and Critical Reading C omprehen- 


sion in Science 


The Pearson Product-Moment formula 
yielded a correlation of .75 + .02 between 
the Gates Reading Survey (Level of Com- 
prehension) and the literal reading section 
of the Intermediate Reading Test—Science. 
With intelligence held constant, the corre- 
lation was .35. 

The Pearson Product-Moment formula 
‘yielded a correlation of .60 + .02 between 
the Gates Reading Survey (Level of Com- 
prehension) and the critical reading sec - 
tion of the Intermediate Reading Test—Sci- 
ence. With intelligence held constant, the 
correlation was .11. 


Problem IV: Literal Reading Achievement and 
ligence quotients. Achievement in Each Critical Reading Skill 


(Science) 


7. 


The Point-Biserial formula yielded corre- 
lations ranging from -.15 to’ + .47 between 
achievement on the total literal reading 
section and performance on each critical 
reading test item. Table XIII presents the 
results. 


Conclusions 


Within the limitations stated, the following 


1. 


conclusions on each problem seem to be valid:- 
Problem I: 


There is a substantial relationship be- 


& 


90° LZ 
9% 
SZ 
£0°. £2 
LT’ 
02 
82° 61 
8T 
9T* LT 
92° 9T 


ST 


92° val 


"SId-"Id 


(€TS = N) 


ONIGVAY TVOLLMD HOVA AO ONITIVA UO ONISSVd AHL 
NOLLOGS ONIGV SY TVUALIT NO 'IVLOL NAGMLAG TVM ASIG-LNIOd 


ATAVL 


MANEY 61 


JOURNAL OF EXPERIMENTAL EDUCATION 


tween literal and critical reading compre- 
hension in science. 


Problem II: 


2. There is a very high relationship between 
verbal intelligence and ‘‘general’’ reading 
ability. 

3. There is a high relationship between verb- 
al intelligence and proficiency in literal 
reading in science. 

4. There is a substantial relationship between 
verbal intelligence and proficiency in criti- 
cal reading in science. 


Problem II: 


5. There is a high relationship between ‘‘gen- 
eral’’ reading comprehension and literal 
reading comprehension of science materi- 
als. 

. There is a substantial relationship between 
‘‘general’’ reading comprehension and crit- 
ical reading comprehension of science ma- 
terials. 


Problem IV: 


7. There is a very low or negligible relation- 
ship in science between proficiency in lit- 
eral reading and in each of the respective 
critical reading skills. 


The conclusions for this study may be sum- 
marized as follows: 


1. Critical reading comprehension in science is 
a complex of skills or abilities, each of which 
is relatively independent of the ability to read 
literally. 

Proficiency in critical reading of science ma- 
terials cannot be predicted from scores ob- 
tained (a) on literal reading tests in science, 
(b) on group tests of verbal intelligence, or 

(c) on ‘*general’’ reading tests. 

. Proficiency in literal reading interpretation 
of science materials may be predicted with a 
fair degree of accuracy from scores on group 
tests of verbal intelligence and ‘“‘general’”’ 
reading tests. 

4. Group tests of verbal intelligence and ‘‘gener- 
al’’ reading tests tend to measure many com- 
mon abilities. 


Implications 


These four general conclusions seem to justi- 
fy the following implications for education: 


In planning an analysis program, both curri- 
culum workers and the personnel responsible for 


the testing program need to give serious thought 
to the limitations of a group test of verbal intel- 
ligence for measuring the capacity of retarded 
readers. Consideration also needs to be given 
to the inadequacy of either the standardized read- 
ing tests available at the elementary school level 
or the group tests of verbal intelligence for pre- 
dicting proficiency in critical reading compre- 
hension in science. This school personnel should 
realize that since critical reading compre hen- 
sion embraces relatively independent abilities, 
a valid diagnosis of that complex can be made 
only by measuring proficiency in each specific 
critical reading skill and by using materials of 
specialized content. Accordingly, a complete 
elementary school analysis program should in- 
clude, in addition to other tests, (1) an instru- 
ment for measuring the capacity of retarded 
readers, and (2) tests of critical reading com- 
prehension that would yield a measure of pr ofi- 
ciency in each specific critical reading skill in 
each given content area. 

There is a crucial need for a new type of in- 
strument to measure reading comprehension at 
the elementary school level. The proposed in- 
strument necessarily would include items de- 
signed to measure the relatively independent abil- 
ities inherent in critical reading comprehension 
of a particular content. The obtained scores 
could not be expressed as a composite score but 
probably could be presented in profile form. 
This would show the relative strength or weak- 
ness in each specific critical reading skill and, 
therefore, could serve as a guide in the prepar- 
ation of the instructional program in the various 
content areas. 

Classroom teachers should recognize that, 
since critical reading ability consists of relative- 
ly separate abilities, the best procedure for de- 
veloping critical reading proficiency is by pro- 
viding instruction in each specific skill. For op- 
timum results, this instruction needs to be sys- 
tematic and direct, e.g., in order to develop 
problem-solving skill in science, opportunities 
for solving problematic situations by using sci- 
ence content should be afforded. By improving 
ability in each specific critical reading skill, 
the general level of critical reading ability 
could be raised, 

Another implication from the study may be 
stated as a caution to classroom teachers. 
From the results obtained in this study it ap- 
pears obvious that this representative population 
of elementary school children tended to be low 
achievers in critical reading comprehension in 
science. This was equally true for children of 
superior intelligence as for those of low intelli- 
gence. It appears vital to urge the classroom 
teacher not to take the development of critical 
reading comprehension for granted as a concom- 
itant of normal or superior intelligence, but to 


realize that it is a skill that needs development. 
Consequently, the teacher should provide, for 
all children, systematic instruction in each of 
the critical reading skills needed for successful 
interpretation of each specialized content. 


Suggestions for Further Research 


A more exhaustive analysis of critical read- 
ing comprehension seems tobe inorder. Among 
the problems that need investigation are the fol- 
lowing: 


1. Construction of instruments for diagnos- 
ing critical reading comprehension in the 
various content areas. 

. Study of the relationship between literal 
and critical reading comprehension (a) at 


other grade levels, and (b) in other con- 


tent areas. 

3. Investigation of the effect of systematic, 
directed instruction in critical reading 
skills (a) at each elementary grade level, 
and (b) in each content area. 

A factorial analysis of critical reading 
comprehension. 


SELECTED BIBLIOGRAPHY 


Adkins, Dorothy, et al. Construction and 
Analysis of Achievement (Washington, 
D.C.: Superintendent of Documents, U.S. 
Government Printing Office, 1947). 

Bedell, Ralph C. The Relationship Between 
the Ability to Recall and the Ability to in- 
fer in Specific Learning Situations, Doc- 


tor’s Dissertation, University of Missouri, 


1934. 
Berry, E. T. ‘‘Improving Freshmen Read- 


ing Ability,’’ English Journal, XX (Decem- 


ber 1931), pp. 824-29. 

. Berry, E. T., and Touton, F. C. ‘‘Read- 
ing Comprehension at the Junior College 
Level, ’’ California Quarterly of Second- 
ary Education, VI (April 1931), pp. 245-51. 

Betts, Emmett A. ‘‘Guidance inthe Criti- 
cal Interpretation of Language, ’’ Elemen- 


tary English (January 1950), pp. 9-22. 
Burt, Cyril. ‘‘The Development of Reason- 


ing in School Children, ’’ Journal of Exper- 


imental Pedagogy, V (December 1919), pp. 
121-27. 


. Conant, Margaret M. The Construction of 


a Diagnostic Reading Test (New York: Bur- 


eau of Publications, Teachers College, 
Columbia University, 1942). 
Croxton, W. C. ‘‘Pupil’s Ability to Gener- 


alize,’’ School Science and Mathe matics, 


XXXVI (June 1936), pp. 627-34. 


Dale, Edgar and Chall, Jeanne. A Formula 


63 


for Predicting Readability (Columbus, Ohio: 
Bureau of Educational Research, Ohio State 
University, 1948). 

Davis, Frederick B. Fundamental Factors 
of Comprehension in Reading, unpublished 
‘Doctor’s Dissertation, Harvard University, 
1941. 

Dewey, Joseph C. ‘‘The Acquisition of Facts 
as a Measure of Reading Comprehension, ’’ 
Elementary School Journal, XXXV (Janu- 
ary 1935), pp. 346-48. 

Ferrell, Frances H. ‘‘An Experiment in the 
Development of Critical Thinking, ’’ Am er- 
ican Teacher, XXX (January 1946), pp. 24- 


. Gans, Roma. A Study of Critical Reading 


Comprehension in the Intermediate Grades, 
Contributions to Education, No. 811 (New 
York: Bureau of Publications, Teachers Col- 
lege, Columbia University, 1940). 

Glaser, Edward M. An Experiment in the De- 
velopment of Critical Thinking, Cont ribu- 
tions to Education, No. 843 (New York: 
Bureau of Publications, Teachers C ol lege, 
Columbia University, 1941). 


. Gray, William S., et al. Reading in General 


Education, A Report of the Committee on 
Reading in General Education (Washington, 
D. C.: American Council on Education, 
1940). 

Gray, William S., and Leary, Bernice E. 
What Makes a Book Readable (Chicago: 
University of Chicago Press, 1935). 

Hall, William E., and Robinson, Francis P. 
‘An Analytical Approach to the Study of 
Reading Skills, ’’ Journal of Educational Psy- 
chology, XXXVI (October 1945), pp. 429- 


Hazlitt, Victoria. ‘‘Children’s Thinking,’ 


British Journal of Psychology, XX, Part 

Ill (July 1929-April 330), pp. 354-61. 
Heidbreder, Edna F. ‘‘Problem Solving in 

Children and Adults,’’ Journal of Genetic 


XXXV (December 1928), pp. 


. Hilliard, George H. Probable Types of Diffi- 


culties Underlying Low Scores in Com pre- 
hension Tests, Studies in Education, II, No. 
owa City: University of Iowa, 1924). 

Humber, W. J. An Experimental Analysis of 
Selected Reading Skills as Related to Cer- 
tain Content Fields at the University of Min- 
nesota, unpublished Ph. D. dissertation, Un- 
iversity of Minnesota, 1942. 

Kuder, G. Frederic, and Richardson, M. W. 
‘The Theory of the Estimation of Test Re- 


liability,’’ Psychometrika, Il, No. 3 (Sep- 
tember 1937), pp. 151-60. 
McCallister, James M. Remedial and C or- 


rective Instruction in Reading (New York: 
D. Appleton-Century Co., 1936). 


MANEY |_| 
12. 
1 
16. 
17. 
3. 
19. 
5 
6. 21. 
22. 
23. 


JOURNAL OF EXPERIMENTAL EDUCATION 


30. Swenson, Esther J. The Relation of the Abil- 

ity to Read Material of the Type Used in 
udying Science to Eighth-Grade Achieve- 

ment, Master’s thesis, University of Min- 


64 


24. McCullough, Constance M., etal. Prob- 
lems in the Improvement of Reading (New 
York: McGraw-Hill, 1946). 

25. National Council for the Social Studies, 
Teaching Critical Thinking in the Soc ial 


Studies, Thirteenth Yearbook (Washington, 


BD. C.: National Education Association, Na- 
tional Council for the Social Studies, 1942). 


26. Russell, David H. ‘‘Reading for Critical 
Thinking, ’’ California Journal of E 1 emen- 
pp. 


a Education, XIV (November 1945), 

27. Salisbury, Rachel. A Study of the Effects of 
Training in Logical Organiz ation as a 
Method of Improving Skill in Study, Ph.D. 
dissertation, University of Wisc onsin, 


1934. 
28.. Shores, J. Harlan. Reading and Study 
Skills as Related to Comprehension of Sci- 
ence and History in the Ninth Grade, Ph.D. 


dissertation, University of Minnesota, 


1940. 
29. Sochor, E. Elona. Literal and Critical 


nesota, 1940. 
. The Teaching of Reading: A Second Report, 
~Thirty-sixth Yearbook of the National Soci- 


ety for the Study of Education, PartlI 
(Bloomington, Ill.: Public School Publish- 
. Thorndike, Edward L. 
Thinking in the Case of Reading, 
pp. 
34. 
. Thorndike, Edward L. ‘‘Reading as Reason- 
ogy, 
. Thorndike, Edward L. 
of Sentences, ’’ Elementary School Journal, 
. Tyler, Ralph W. 
Infer,’? Educational Research Bulletin, 
IX, No. 17 (November 19, 1930), pp. 


ing Co., 1937). 
‘*The Psychology of 
logical Review, XXIV (May 1917), 
ing: A Study of Mistakes in Paragraph 
Reading, ’’ Journal of Educational Psychol- 
Vill, No. 6 (June 1917), pp. 353- 32. 
Understanding 
XVIII (October 1917), p. 114. 
“‘Measuring the Ability to 
475-80. 


in Social Studies, Doctor’s disser- 


tation, Temple University, 1952. 


= 
32 
33 


JOURNAL OF EXPERIMENTAL EDUCATION 
(Volume 27, September 1958) 


CHILDREN’S PERCEPTIONS OF RELATION- 
SHIPS AMONG THEIR FAMILY 
AND FRIENDS 


IVAN N. MENSH and JOHN C. GLIDEWELL** 
St. Louis, Missouri 


Introduction 


UNTIL RECENTLY, the personal and social 
development of children has been studied princi- 
pally from the point of view of adults. It has be- 
come increasingly evident that not only are adult 
perceptions of child behavior important to our 
understanding, but there also is necessary some 
insight into children’s perceptions of themselves, 
their peers and the adult world. A number of 
techniques have been developed to assess these 
perceptions, ranging from theclinical interview 
through play and other projective methods to 
more structured tests. For example, children 
in the age range of four to fourteen years have 
been studied variously by Del Solar (2), Griffiths 


(5), Herbst (6), Mott (9), and Rogers (10), bydi- © 


rect observation, interview, questionnaire and 
other test techniques. In these studies there 
have been reported children’s perceptions of 
a) childcare andcontrol (6); b) household, social 
and economic relationships and activities (6); 
c) behavior difficulties of children as they them- 
selves see them (5); and d) self and other rela- 
tionships (10). In one of these investigations, 
Griffiths studied 900 school children ranging in 
age from 6 to 14, ina research program de- 


signed to test three major hypotheses. These 
were: 


1. That young children of the early el ementary 
school level are aware mainly of those diffi- 
culties characterized by overt and aggressive 
behavior, but that withincreasing age they be- 
come increasingly aware of difficulties of a 
submissive or withdrawing nature. 


That as children grow older, parents’, teach- 
ers’, and children’s judgments of behavior 
difficulties are in greater agreement. 


That children from the middle social -econ- 
omic groups are essentially conformists; 

they show fewer overt and aggressive be hav- 
ior difficulties, but more difficulties of a sub- 


*All footnotes will be found at end of article. 


missive or withdrawing nature, thanchildren 
from the upper and lower socio-economic 
groups (5). 


Data to test these hypotheses were gathered 
from parents and teachers by questionnaire, and 
from the children by interview. The behaviors 
analyzed were categorized as aggressive, delin- 
quent-related, withdrawing, or non-compliant. 
The latter applied only to the home situation, 
therefore teachers’ replies were not scored for 
this category. In general, the findings supported 
the hypotheses. 

In a similar study but with a much smaller 
sample (36 children ranging in age from 6 to 12), 
Del Solar (2) reported that chilaren perceive their 
teachers’ goals for the children as improvement 
in academic achievement and in classroom behav- 
ior. Inrelationtoparents, the children reported 
their recognition of 1) parental authority, 2) their 
own personal responsibility, and 3) the need for im- 
proved sibling relationships. Suchdata have sug- 
gested the significance of children’s perceptions 
as an area affording new insights into behavior. 


I. Subjects and Family Structure Study Procedure 


As part of a series of classroom observations 
in alarger project (3,4) andin studying children’s 
perceptions of family structure and interrelation- 
ships, the method of asking the children to list 
their family members, best boy friend, and best 
girl friend, and to rank their preferences among 
them was adopted as a technique for sampling 
childhood perceptions of the family constellation. 
During the same experimental session in which 
these data were obtained, the children also re- 
sponded to a series of sociometric choices (cf. 
Part II in this study) within their respective class- 
room units. The children were studied by a team 
of four observers in a regularly scheduled morn- 
ing session, during which the teacher was out of 
the classroom. At the time of observation the 
teachers were asked to rate the chilaren ona four- 
point scale of general adjustment. In addition, 


there were available for study reports from 1) 
the school records, 2) the mental health service 
workers in contact with the mothers and with 
school personnel, and 3) in a few cases (N = 6), 
a Child Guidance Clinic. Also, at approximate- 


the mothers were interviewd in their homes by 
trained interviewers during a two-hour period. 
Thus, data from 1) direct observation of the 
children in the social structure of the school 
classroom, 2) their mothers in the home situa- 
tion, 3) the teachers, and 4) the mental health 
workers were gathered independently. The lat- 
ter two independent sources of data were used 
as criteria against which to evaluate the chil- 
dren’s behavior. These raters were instructed 
in the definitions of general adjustment level de- 
veloped from Ullmann’s earlier study (13). 


1. A child who is unusually well-adjusted in his 
relationships with others andin his accom- 
plishments. 


2. A happy child who gets along well and accom- 
plishes reasonably well the things that usual- 
ly go with his age and level of development. 


3. A child who is not so happy as he might be; 
has moderate difficulties in getting on; grow- 
ing up presents something of a struggle. 


4. A child who now has, or at his present rate 
is likely sooner or later to have, serious 
problems of adjustment and needs or may 
need special help or care because of such 
problems. 


The sample under study consisted of third- 
grade children whose level of reading and writ- 
ing necessitated a minimum demand on these 
skills. For the family structure perceptions, 
the words for family members —‘‘father’’, 
‘“‘mother’’, etc.,—were printed on the black- 
board and copied by the children under the direc- 
tion of several observers. ‘‘Family’’ was de- 
fined as all children and adults living in the home. 
‘*Roomer”’ and ‘‘boarder’’ also were included 
where necessary, aS were grandparents and 
other relatives who lived as members of the 
household. The family was listed according to 
age, including ‘‘me’’, for all subjects. At the 
bottom of the prepared form, two spaces were 
assigned, one for ‘‘best boyfriendin the room’’, 
and the second for ‘‘best girl friend in this 
room’’. Following this reporting of ‘‘fam ily 
and friends’’, each child individually reviewed 
his list with an observer to insure accuracy of 
report. 

The second step in this method of sampling 
childhood behavior consisted of each child rank- 
ordering his preferences for all family mem- 


66 JOURNAL OF EXPERIMENTAL EDUCATION 


ly the time of the observations in the classrooms, 


bers and for his two best friends, rating the most 
preferred individuals as ‘‘1’’, the next as ‘‘2”’, 
etc. Again his reporting was individually checked 
by one of the several observers. There were oc- 
casional instances in which a girl or boy was re- 
luctant to indicate ‘‘best friend’’ of the opposite 
sex but this was overcome by quietly talking with 
the child to complete the ‘‘game’’. When complet- 
ed, the data obtained by this method included lists 
and preferences of family and friends for a total 
of 91 third-grade youngsters from three class- 
rooms, one from each of three adjacent public 
school districts. Although, in their rankings, 13 
of the 91 children had incorrectly followed in- 
structions, ten sets of ratings were corrected 
without difficulty and only three were unusable in 
the data analysis. 

The data permitted the testing of several hy- 
potheses which had been developed (3,4) in the 
larger project. In an earlier study, Rogers (10) 
had reported some of these hypotheses and tested 
them in the development of his ‘‘Test of Personal- 
ity Adjustment’’.! The first stated that degree of 
disturbance in the child as assessed by teacher 
and trained mental health worker was related to 
his perceptions of the family constellation. Thus, 
it was hypothesized that a child who rates one par- 
ent as ‘‘1”’ and the other as ‘‘3’’ or more would 
show greater disturbance than one who rated the 
parents in ‘‘1-2’’ order. Similarly, children who 
preferred friends to sibs, and, also, children 
who rated those sibs just next to them in birth or- 
der as least preferred (sibling rivalry) among the 
sibs and friends would be rated as disturbed. 

Along another dimension of the present study, 
the hypothesis was formulated that size of family 
would be related to the degree of disturbance in 
the child. The analyses were designed also to 
test the hypothesis that sex of the child would dif- 
ferentially contribute to the family and friend pref- 
erences. Finally, as a result of aprevious study 


' (1), it was hypothesized that the presence of grand- 


parents in the home would be related to disturb- 
ance in the child. There were one or both grand- 
parents in ten of the 88 family homes, perhaps 
constituting a subculture with psychological char- 
acteristics differentiating these homes from the 
other 78 homes. 

A scoring system was devised to include the 
variations of preferences within the family, based 
upon a total score of ten from which a point was 
subtracted for each of the following rank orders: 


a. Father and/or mother rated other than 1 
or 2; 

b. Friends preferred to grandparent(s); 

c. Friends preferred to parents and/or sibs; 

d, Other adults preferred to grandparents, 
parents and/or sibs; 

e. Sib next to subject least preferred. 


MENSH - GLIDEWELL 67 


The empirical scoring system just described 
may be contrasted with a similar one developed 
by Rogers. The children’s ratings were scored 
by both systems. Rogers’ scores constituted 
one of four subtests which contributed toa ‘‘fam- 
ily maladjustment score’’ in evaluating respon- 
ses to his ‘‘Test of Personality Adjustment. ”’ 
The test manual recommended the following 
scoring system.(10): 


If there are two or more sibs and one of the 
sibs next to the subject is given the highest 
number in the family, 1 point. 


If one of the friends is given a lower number 
than some member of the family, 2 points. 


If parents are separated by two ratings (e.g., 
mother rated ‘‘1’’, father rated ‘‘3’’), 2 points. 


If parents are separated by more than two rat- 
ings, 4 points. 


If parents receive highest number, 2 points. 


The data, scores obtained from the third- 
graders’ preferences among their respective 
family members and classroom friends, were 
treated by analyses of variances associated with 
the variables of school, sex and criterion group 
(ratings of general adjustment); and by chi 
square analyses where frequencies constituted 
the data. 


Results 


The series of analyses of the children’s re- 
sponses tested their preferences for one or both 
friends over family members, one or more sibs 
over the parents, grandparents over parents, 
father over mother, and sib next to subject least 
preferred in the array. Among the 46 boys, 14 
listed preferences in a modal or ‘‘normal’’ or- 
der, as did 19 of the 42 girls. This order listed 
mother, father, sibs and friends, successively, 
and the sib next to the subject was not least pre- 
ferred. Thus, a modal pattern, following an 
a priori hypothesis about order of preference, 
was exhibited by more than a third of the sub- 


jects. Two other patterns appeared—father pre- 
ferred to mother, and sibling next to the subject 


in age rated as least preferred. Twelve chil- 
dren exhibited the former pattern and 15the sib- 
ling order. In seven other instances the chil- 
dren rated a sibling as more preferred thanone 
of the parents. Each of the remaining seven pat- 
terns were indicated only by 1-4children. Thus, 
67 of the 88 children rated their preferences in 
one of four patterns. 

Chi square analyses of frequency distribu- 
tions (using Yates’ correction for discontinuity, 


12) by criterion group, sex, and patterns of pref- 
erences, totalling 22 analyses for the various dis- 
tributions, yielded no significant probabilities for 
the chi square values. These values rangedfrom 
-00 to 2.60, none approximating the value re- 
quired for a significant probability (P .05 - .01 = 
3.84 - 6.64). These findings indicate that there 
were not significant relationships, for either boys 
or girls, between preference patterns and general 
adjustment as rated by teachers and trained men- 
tal health workers. This was true also for the 
pattern of ‘‘normality’’ which did not occur signif- 
icantly more often among boys or girls, or among 
those rated as adjusted or poorly adjusted. 

The size of the family ranged from one child 
and his parents to nine children and their parents, 
with the mean number of children 2.6 per family. 
The households were distributed as follows: 


Family Constellation 


Grandfather with family 

Grandmother with family 

Both parents with family 

Other adults with family 

Only one child in family 

Parents and two or more 
children 


Analyses of the variances associated with size 
of family, sex of the third-graders, andcriterion 
group assignment (adjustment level) also did not 
produce significant probabilities, either for these 
effects or for interactions among the effects. The 
F-ratios varied from .87 to 2.26, values well be- 
low those required even at the .05 level of signif- 
icance. 

In treating the data on presence of grand- 
parents in the home and this effect upon prefer- 
ences and general adjustment, the incidence of 
grandparents in the home was low, as reported 
above, with about twelve percent of the children 
living in households with grandparents as mem- 
bers. As with the analyses just reported, there 
were not significant differences between pre fer- 
ences and adjustments of children with grandpar- 
ents living in the family home and those without 
grandparents in the home. 


Discussion 


The series of hypotheses tested here had been 
derived from clinical experience and constituted 
the principal elements of the design. Except for 
the ‘‘modal’’ pattern, these hypotheses did not 
find support in the data obtained from the 91 third 
grade children in the present study. Not uncom- 
monly are clinical ‘‘hunches’’ and other exper- 
iences not supported in controlled studies. This 
has been demonstrated variously by Kelly and 
Fiske (7), Meehl (8), and Rogers et al (11). In 


68 JOURNAL OF EXPERIMENTAL EDUCATION 


these assessment and prediction studies, follow- 
ups of the clinical interview, case history and 
test data did not successfully predict behavior in 
spite of the clinical traditions surrounding their 
use. It should be emphasized, however, that 
the investigators were aware of the yet-unsolved 
criterion problems. 

Another interpretation of the present findings 
suggests that these results may be a function of 
the developmental stage at which the data were 
obtained—the preadolescent age range of 8-10in 
which the family still dominates the personal 
and social worlds of the third-graders; and ex- 
tra-family group behavior, even in the social 
unit of the classroom, does not yet have greater 
influence. Also, the ordinal position of the child 
in the family hierarchy and the special charac- 
teristics of the homes with grandparents as part 
of the family units may be determinants in chil-~ 
dren’s preferences. Finally, the a priori, ex- 
pected pattern of intra-family preference did ap- 
pear to a significant degree, suggesting the po- 
tency of this cultural characteristic. 


Ill. Sociometric Procedure 


During the classroom observations described 
earlier, the teacher was out of the room for the 
entire morning period. The classroom was se- 
lected as the social environmental unit and socio- 
metric choices restricted to this unit. These 
data were designed to yield information on both 
the number and quality of peer relationships, 
two of the dimensions of study. Each child was 
given six copies of a class seating chart on which 
all children were identified by name andnumber. 
The names were those used by the children in 
their usual interactions, e.g., ‘‘Larry’’, ‘‘Rus- 
ty’’, ‘‘Bobby’’, rather than the formal, given 
name; and the numbers were those worn for 
identification in a halter arrangement by the chil- 
dren throughout the morning. Six separate 
sheets were used in an attempt to keep as inde- 
pendent as possible the six sociometric choices 
desired from each child. The observers dis- 
couraged the few children who looked at earlier 
judgments, reminding them of the instructions 
given previously not to look back (at the sheets 
beneath the one on which they were working at 
the moment). 

The six sets of judgments asked of the chil- 
dren were ‘‘decide which three boys or girls in 
this room you most like’’, ‘‘most like to play 
with’’, ‘‘do not like’’, ‘‘do not like to play with’’, 
“‘ask you to do things you yourself don’t want to 
do’’, and ‘‘do not ask you to do things you your- 
self don’t want to do’’. The latter two judgments 
were designed to get information about ‘‘demand- 
ingness’’, another of the dimensions considered 
significant in child behavior, as evolved in the 
research program (3,4) from conferences with 


the mental health service workers. The instruc- 
tions given the children follow: 


Please look at the paper infront of you. 
You see that the name of every boy and girl 

in this room is on the sheet, andthe number 
is next to the name. Now decide which 
three boys or girls in this room you most 
like. (Pause) Put a ‘‘1’’ by each of their 
numbers (demonstrated on blackboard). 
(Pause) Be sure there are three 1’s on the 
Sheet, and that these 1’s are next to the 

numbers of the boys or girls you most like 

in this room. Now put a second “1” (dem- 
onstrated) by the number of the boy or girl 
you like most of all in this room. (Observ- 
ers check responses to see that each child 
has three 1’s, one of which is double.) Now 
put the sheet under the others on your desk. 


Results 


A number of statistical problems arose in treat- 
ing the data, from the disproportional distribution 
of girls and boys, numbering 44 and 47, respec- 
tively, over the four criterion groups (N’s of 22, 
40, 19 and 10), the varying classroom sizes (N’s 
of 35,33 and 23), and the varying restrictions of 
choices among the children. In meeting the first 
problem, a correction for disproportionality (12) 
statistically controlled the differences inN’s over 
the criterion groups; in the latter analysis, re- 
striction of choices was treated as a variable of 
significance in the sociometric designs, i.e., it 
was hypothesized that a child whorestricted his 
choices to a single classmate differs psychologi- 
cally from another who nominates three different 
peers in his choices. 

In order to handle the second problem, a scor- 
ing system was devised to account for the varying 
numbers of choices. This latter was taken from 
the following nomograph: 


Number of Positive Choices 
0-1 2-3 4-5 6-21 


0-1 4 5 6 7 
Number of 
2-3 3 4 5 6 
negative 
4-5 2 3 4 5 
choices 
6-14 1 2 3 4 


It can be seen that a score of 7 was assigned 
individuals chosen positively six or more times 
and withno, ornomore than one, negative choice, 
and ascore of 1 was givenachild who was chosen 
negatively six or more times and who had received 
no, or no more than one, positive choice. The 


MENSH - GLIDEWELL 


coded scores distributed themselves as follows: 


Score 1 2 3 4 5 6 7 
N 8 9 11 30 16 10 7 


Analyses of distributions of scores by sex, 
school and criterion group were made, with the 
results indicating significant differences (P = .01) 
among the latter, but not for the sex and school 
variables. The mean scores of the criterion 
groups ranged from 5.0 to 3.2—5.0, 4.2, 3.2 
and 3.2 for groups 1 (best-adjusted, according 
to teacher and mental health service worker rat- 
ings) to 4 (in need of or receiving psychiatric 
help), respectively. Mean scores of the boys 
varied from 4.5 to 3.0, and for the girls from 
5.1 to 3.1, but the differences between the sexes 
(boys’ mean, 3.8; girls’ mean, 4.3) were not 
significant. Mean scores among the three 
schools did not vary significantly (4.03, 4.06 
and 4.04), nor did means of the various restric- 
tion groups, termed ‘‘restriction of interperson- 
al relationships’’. The latter consistedof three 
categories of children—in the first were those 
children who selected a different peer for each 
of the sociometric choices, in the second were 
those who selected the same child for two of the 
three choices (either positive or negative), and 
in the third were those who nominated the same 
peer for each of the three choices, i.e., a child 
restricted his interpersonal relations toa single 
classmate in his positive choices andtoa second 
classmate in his negative choices. The means 
of these three groups were 4.0, 4.1 and 4.1, re- 
spectively. Although these mean sociometric 
scores did not vary significantly, interaction ef- 
fects of school and IPR (the ‘‘interpersonal re- 
striction’’ manifested by the children in select- 
ing sociometric choices) were significant, indi- 
cating that the association between IPR and 
school varied from school to school. Thus, in 
Schools 1 and 2, children who selected a differ- 
ent peer for each sociometric choice were 
chosen positively more often than in School 3; 
but in School 3 the mean sociometric was highest 
(5.0, contrasted with 2.25 and 3.80 for Schools 
1 and 2, respectively) for those children who re- 
stricted their choices. 

The teachers’ ratings of general adjustment 
were consistent with the sociometric choices of 
the pupils, with the latter’s choices ranging in 
scores from 4.2 to 5.1 for the students rated 1 
or 2, and 3.0-3.3 for the students rated 3 or 4 
by the teachers. It should be remembered that 
the ratings and the sociometric scores were ob- 
tained independently, although, of course, both 
teachers and pupils may have based some or all 
of their judgments on the same behaviors. These 
data then indicate that, with teachers’ ratings 
as independent criteria, students’ soc iometric 
choices as early as the third grade of school sig- 


69 


nificantly differentiate the levels of adjustment of 
the pupils. And this discriminationoccurs on the 
basis of student choices related to ‘‘like’’, ‘‘play 

with’’, and demandingness, without reference to 
the adjustment dimension along which teachers 
rated the children. 

In another analysis, by the chi square method, 
the frequency with which boys chose other boys or 
chose girls, and vice versa, was examined. For 
example, did the boys and girls tend to pick the 
same or opposite sex classmates for their posi- 
tive and negative choices? The chi square proba- 
bility values (P) varied by school and sociometric 
variable, as shown below. (NS indicates P value 
of obtained chi square was less than . 05.) 


School 
Sociometric Area il 2 3 


Like most NS NS .05 
Like most to 

play with -01 -01 -05 
Least demanding NS NS NS 
Like least .05 NS .05 
Like least to 

play with -01 NS -01 
Most demanding NS NS NS 


These data predominantly show significant 
sex discriminations only in the play area, i.e., 
boys and girls chose their same sex significant- 
ly more often than they chose the opposite sex. 
In contrast, on the ‘‘demandingness’’ dimension, 
there were no significant discriminations, with 
both boys and girls choosing their same or the op- 
posite sex in roughly similar proportions. In the 
‘liking’ area, only three of the six chi squares 
had significant P-values, with the children of 
School 2 not discriminating on the basis of sex, 
unlike the School 3 boys and girls. With the one 
exception in the area of ‘‘like most to play with’’, 
School 2 children did not select the same sex in 
their choices any more often than they chose 
classmates of the opposite sex, unlike the stu- 
dents of Schools 1 and 3. In general, the data in- 
dicate that sex differences are sharply drawn for 
play but are less distinct for ‘‘liking’’, and do not 
seem to operate in interactions in which these 
third-graders perceive classmates as ‘‘de mand- 


ing’’. 


Discussion 


The present experiment has resulted in a sys- 
tem of scores which summarizes both the num- 
ber and quality (positive or negative) of socio- 
metric choices. These scores distributed them- 
selves normally, and significantly discriminated 
the four criterion groups. Thus, criterion 
ratings from trained mental health worker and 
classroom teacher, and student choices independ- 


SC: 

NS 
01 
NS 
01 
| NS 


70 JOURNAL OF EXPERIMENTAL EDUCATION 


ently made, were significantly related. Distri- 
butions of scores between the sexes and among 
the schools showed no significant variations 
along these two dimensions. 

The hypothesis about the relationship of gen- 
eral adjustment, as rated by teachers and men- 
tal health workers, to the degree with which the 
subjects restricted their sociometric choices, 
was not supported by the data. The analysis was 
designed to test the idea that the more disturbed 
children, as judged by adults, make fewer rela- 
tionships than those students who have been rat- 
ed as better adjusted. The significant interac- 
tion effects between school and IPR (interperson- 
al restriction) do, however, indicate that the re- 
lationship between these factors varies from 
school to school. There was, further, support 
for the hypothesis that IPR is related, in peer 
judgments, to adjustment, for the data dem on- 
strate that children who had but few interactions 
in the classroom were less frequently positively 
chosen and more often negatively chosen than 
those whose range of sociometric choice indicat- 
ed greater adjustment in the social situation. 

Sex differences were consistently observed, 
with but one exception, only in the dimension of 
play; and just the opposite obtained in the dimen- 
sion of demandingness where boys and girls did 
not discriminate sex in their choices, whether 
positive or negative, and no more often chose 
same or the opposite sex. Again, there were 
differences among the schools, differences which 
have been found consistently with respect to 
other measures in the overall evaluation pro- 
gram. 


Summary 


As part of an evaluation program in com mu- 
nity mental health service activities, children 
were observed in the classroom situation as the 
social unit. A sample of 91 third grade school 
children were asked to indicate the members of 
their household, including family, relatives and 
other members regularly living in the home. To 
this list were added the ‘‘best boy friend’’ and , 
the ‘‘best girl friend’’ of each. The entire list 
then was rank-ordered for preferences. Several 
hypotheses derived from clinical experience 
were put to this controlled test—preferences of 
friends over parent and sibs, sibs over parents, 
grandparents over parents, father over mother, 
and sib next to the subject rated as least pre- 
ferred; grandparents in the home; sexof the child; 
and size of family; all related to disturbance in 
the child. None of these hypotheses found sup- 
port in the data from the sample of children in 
the present study. There was, however, a 
‘*modal’’ perception of the family constellation 
which corresponds to the expected pattern in 
our culture. 


Another set of observations consisted of a ser- 
ies of sociometric choices—‘‘like most’’, ‘‘like to 
play with’’, ‘‘not demanding’’, ‘‘like least’’, ‘‘do 
not like to play with’’, and ‘‘demanding’’. These 
data are reported in terms of the number and qual- 
ity of peer relationships, their relationship to rat- 
ings of general adjustment, and sex and school 
differences. To enable the treatment of data 
gathereafrom the children’s sociometric choices, 
a scoring system has been devised to provide an 
index of both number and direction (positive or 
negative) of choice. It was found that these socio- 
metric scores significantly differentiated the four 
levels of mental health used as criterion meas- 
ures. Further, it is significant that there is high 
correspondence between the criterion ratings by 
trained mental health workers and by classroom 
teacher, and the independently made pupil socio- 
metric choices. Thus, these data indicate that, 
with ratings by trained adult raters as independ- 
ent criteria, students’ sociometric choices as 
early as the third grade of school significantly dif- 
ferentiate the four levels of psychological adjust- 
ment of school children here specified. Finally, 
children with but few classroom interactions are 
more often negatively chosen and less often posi- 
tively chosen by their classmates, suggesting that 
the degree of interaction in the class may be one 
of the socio-psychological dimensions along which 
the children range their evaluations of class- 
mates. The data summarized here also indicate 
significant variations in these interactions from 
school to school. 


REFERENCES 


1. Buchmueller, A. D., and others. ‘‘A Group 
Therapy Project with Parents of Behavior 
Problem Children in Public Schools: II. A 
Comparative Study of Behavior Problems in 
Two School Districts, ’’ Nervous Children, 
X (1954), pp. 415-424. 


2. Del Solar, Charlotte. Parents and Teachers 
View the Child (New York: Bureau of Publi- 
cations, Teachers College, Columbia Uni- 
versity, 1949). 


3. Glidewell, J. C., and others. ‘‘Methods for 
Community Mental Health Research: I. Hy- 
pothesis Formation, ’’ American Journal of 
Orthopsychiatry (in press). 


4. Glidewell, J. C., and others. ‘‘Methods for 
Community Mental Health Research: II. Re- 
search Design and Controls, American 
Journal of Orthopsychiatry (in press). 


5. Griffiths, W. Behavior Difficulties of C hil- 
dren as Perceived and Judged by Parents, 


MENSH - GLIDEWELL 71 


Teachers, and Children Themselves (Min- 


neapolis: University of Minnesota Press, 
1952). 


6. Herbst, P. G. ‘‘The Measurement of Fam- 
ily Relationships,’’ Human Relations, V 
(1952), pp. 3-35. 


7. Kelly, E. L., and Fiske, D. W. The Pre- 
diction of Performance in Clinical Psy- 
chology (Ann Arbor: University of Michi- 
gan Press, 1951). 


8. Meehl, P. E. Clinical vs. Statistical Pre- 
diction: A Theoretical Analysis and a Re- 
view of the Evidence (Minneapolis: Univer- 
sity of Minnesota Press, 1954). 


9. Mott, Sina M. ‘‘Concept of Mother: A Study 
of Four and Five- Year-Old Children,’ Child 


10. 


11. 


12. 


13. 


Development, XXIII (1954), pp. 99-106. 


Rogers, C. R. Measuring Personality Ad- 
justment in Children Nine to Thirteen 


Years of Age (New York: Bureau of Publi- 
cations, Teachers College, Columbia Uni- 
versity, 1931). 


Rogers, C. R., andothers. ‘‘The Role of 
Self Understanding in the Prediction of Be- 
havior,’’ Journal of Consulting Psychology, 
XII (1948), pp. 174-86. 


Snedecor, G. W. Statistical Methods 
(Ames, Iowa: Iowa State College Press, 
1953). 


Ullmann, C. A. Identification of Maladjust- 
ed School Children, No. 7 (Wash ington, 
D.C.: Public Health Monograph, 1952). 


FOOTNOTES 


*This is a report on part of a Mental Health Re- 
search Program of the St. Louis County Health 
Department, supported by research grant 
M-592 from the National Institute of Mental 
Health of the National Institutes of Health, Unit- 
ed States Public Health Service. 

Grateful acknowledgment for their coopera- 
tion is made to School Superintendents Hugo E. 
Beck, Bayless School District; James Lindhurst, 
Hancock School District; andCharles J. Mesnier, 
Affton School District. 


**St. Louis County Health Department, and De- 
partment of Psychiatry and Neurology, Wash- 
ington University School of Medicine. 


1. This test was developed by Rogers ‘‘while 


on a fellowship at the Institute for ChildGuid- 
ance, New York City. The subjects used 
were ‘problem’ children referred for inten- 
sive study and treatment. Briefly, the meth- 
od was as follows: Detailed ratings on each 
child were obtained from clinic workers— 
psychiatrists, psychologists, social workers 
—who had intimate knowledge of and contact 
with the child. These ratings were thencom- 
pared with the child’s responses on the test. 
It was found that children with a poor group 
adjustment—those who felt inferior socially 
—tended to give certain respnses. Day- 
dreaming children tended to give other re- 
sponses. And so on with other types. From 
these typical responses it was possible to 
build up a scoring system which applied to 
other children.’’(10) 


Page 1 
Read the instructions 
oom aes on page 2 of this form. 


STATEMENT REQUIRED BY THE ACT OF AUGUST 24, 1912, AS AMENDED BY THE ACTS 
OF MARCH 3, 1933, AND JULY 2, 1946 (Title 39, United States Code, Section 233) SHOWING 
THE OWNERSHIP, MANAGEMENT, AND CIRCULATION OF 


JOURNAL EXPERIMENPAL EDUC ATION (State exact frequency of issue) 
for .Qetever 1. io 


1. The names and addresses of the publisher, editor, managing editor, and business managers are: 


MADISOM, WISCONSIN 
(Name ‘post office and State where publication has second-class entry) 


ame 
Publisher Dembar Publications, Inc. 
Editor 
Managing editor dean W. Williams 


2. The owner is: (If owned by a corporation, its name and address must be stated and also immediately thereunder the 
names and addresses of stockholders owning or holding 1 percent or more of total amount of stock. If not owned by a 
corporation, the names and addresses of the individual owners must be given. If owned by a partnership or other unincorpo- 
rated firm, its name and address, as well as that of each individual member, must be given.) 


Address 
Madison, Wisconsin 
Madison, Wisconsin. 
Madison, Wisconain 
Madison, Wisconsin. 


3. The known bondholders, mortgagees, and other security holders owning or holding 1 percent or more of total amount 
of bonds, mortgages, or other securities are: (If there are none, so state.) 


Name 


4, Paragraphs 2 and 3 include, in cases where the stockholder or security holder appears upon the books of the company 
as trustee or in any other fiduciary relation, the name of the person or corporation for whom such trustee is acting; also the 
statements in the two paragraphs show the affiant’s full knowledge and belief as to the circumstances and conditions under 


which stockholders and security holders who do not appear upon the books of the company as trustees, hold stock and secu- 
rities in a capacity other than that of a bone, fide owner. 


5. The average number of copies of each issue of this publication sold or distributed, through the mails or otherwise, to 
paid subscribers during the 12 months preceding the date shown above was: (This information is required from daily, 


weekly, semiweekly, and triweekly newspapers only.) 
AS. Presiclert 


(Signature of editor, publisher, bwSin@ss manager, or owner) 


Sworn to and subscribed before me this 22” 


[sEaL] 


Address 
Business manager ---.-.-Madison, 
Name 
Address 
16—18720-6 


NOW AVAILABLE 


HANDBOOK OF PRIVATE SCHOOLS 
1958 Ed.—1248 pgs.—$10 copy 


DIRECTORY FOR EXCEPTIONAL CHILDREN 
facilities for handicapped, 3rd. Ed.—$6 


GUIDE TO SUMMER CAMPS AND SUMMER SCHOOLS 
12th Ed.—$3.30, cloth; $2.20, paper | 


TOYNBEE AND HISTORY, on evaluation, $5 


PORTER SARGENT 
educational publisher, 45 years—11 beacon st., boston 


? 
> 


Specifications for Manuscripts 


for the... 


JOURNAL OF EDUCATIONAL RESEARCH 
and the... 


JOURNAL OF EXPERIMENTAL EDUCATION 
1. All manuscripts must be typewritten, double spaced, and on one side 


2. All unusual symbols or formulae must be very clearly typed or hand 
printed in black ink. To avoid costly printers’ composition charges it 


restrictions and requirements 
drawings, graphs or other illustrated materials,—they must be neatly 
on paper or tracing cloth suitable for repro- 
our magazines are printed in black ink only. Color 
graphs should be changed by the auther to provide different kinds of 


shading for the different areas. For example: diagonal lines for red, 
vertical lines for blue, etc. Provide a key. 


4. All tables, graphs, etc., on sheets by themselves must be properly labeled 
and identified in relation to the written copy of the manuscript. 


5. Footnotes must’ be complete as to author, title, place of publication, 
publisher, date and pages. They must be numbered consecutively 
throughout the article. 


6. Bibliographical notes must be complete and arranged alphabetically. 


The cooperation of all prospective authors in following these rules is 
earnestly required. It is difficult to produce technical journals accurately, 
neatly, and on time under the best conditions. Promptness in printing, 


' economy, and accuracy will be promoted by carefully prepared manuscripts. 


of the sheet only. Mimeographed and ditto sheets are acceptable only 
when very clearly printed. 

i may be necessary for us to make cuts of difficult matter, or to print 
your material by the photo-offset lithography method. The latter means 
photographing your actual copy. It is expensive to have material re- 

; drawn by our own artists, and retracing or duplicating increases the 
hazards of error. See that your copy is correct and complete as you 
wish to have it reproduced. The men who work on your manuscripts 
are not trained to understand the working symbols and language of 
your technical field. 

te 

| 


